MEMORY-VQ: Compression for Tractable Internet-Scale Memory

Yury Zemlyanskiy,Michiel de Jong,Luke Vilnis,Santiago Ontañón,William W. Cohen,Sumit Sanghai,Joshua Ainslie

CoRR（2023）

引用 0|浏览61

暂无评分

摘要

Retrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world. Memory-based methods like LUMEN pre-compute token representations for retrieved passages to drastically speed up inference. However, memory also leads to much greater storage requirements from storing pre-computed representations. We propose MEMORY-VQ, a new method to reduce storage requirements of memory-augmented models without sacrificing performance. Our method uses a vector quantization variational autoencoder (VQ-VAE) to compress token representations. We apply MEMORY-VQ to the LUMEN model to obtain LUMEN-VQ, a memory model that achieves a 16x compression rate with comparable performance on the KILT benchmark. LUMEN-VQ enables practical retrieval augmentation even for extremely large retrieval corpora.

查看译文

关键词

compression,internet-scale

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要