LOCOST: State-Space Models for Long Document Abstractive Summarization
CoRR(2024)
摘要
State-space models are a low-complexity alternative to transformers for
encoding long sequences and capturing long-term dependencies. We propose
LOCOST: an encoder-decoder architecture based on state-space models for
conditional text generation with long context inputs. With a computational
complexity of O(L log L), this architecture can handle significantly longer
sequences than state-of-the-art models that are based on sparse attention
patterns. We evaluate our model on a series of long document abstractive
summarization tasks. The model reaches a performance level that is 93-96
comparable to the top-performing sparse transformers of the same size while
saving up to 50
Additionally, LOCOST effectively handles input texts exceeding 600K tokens at
inference time, setting new state-of-the-art results on full-book summarization
and opening new perspectives for long input processing.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要