Lossy Compression Options for Dense Index Retention

Joel Mackenzie,Alistair Moffat

ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL IN THE ASIA PACIFIC REGION, SIGIR-AP 2023（2023）

引用 0|浏览6

暂无评分

摘要

Dense indexes derived from whole-of-document neural models are now more effective at locating likely-relevant documents than are conventional term-based inverted indexes. That effectiveness comes at a cost, however: inverted indexes require less than a byte per posting to store, whereas dense indexes store a fixed-length vector of floating point coefficients (typically 768) for each document, making them potentially an order of magnitude larger. In this paper we consider compression of indexes employing dense vectors. Only limited space savings can be achieved via lossless compression techniques, but we demonstrate that dense indexes are responsive to lossy techniques that sacrifice controlled amounts of numeric resolution in order to gain compressibility. We describe suitable schemes, and, via experiments on three different collections, show that substantial space savings can be achieved with minimal loss of ranking fidelity. These techniques further boost the attractiveness of dense indexes for practical use.

查看译文

关键词

Index compression,dense indexing,lossy compression

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要