LOCOST: State-Space Models for Long Document Abstractive Summarization

Florian Le Bronnec, Song Duong,Mathieu Ravaut,Alexandre Allauzen,Nancy F. Chen,Vincent Guigue, Alberto Lumbreras,Laure Soulier,Patrick Gallinari

CoRR（2024）

引用 0|浏览8

暂无评分

摘要

State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of O(L log L), this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96 comparable to the top-performing sparse transformers of the same size while saving up to 50 Additionally, LOCOST effectively handles input texts exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要