Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
CoRR(2024)
摘要
State-space models (SSMs), such as Mamba Gu Dao (2034), have been proposed
as alternatives to Transformer networks in language modeling, by incorporating
gating, convolutions, and input-dependent token selection to mitigate the
quadratic cost of multi-head attention. Although SSMs exhibit competitive
performance, their in-context learning (ICL) capabilities, a remarkable
emergent property of modern language models that enables task execution without
parameter optimization, remain underexplored compared to Transformers. In this
study, we evaluate the ICL performance of SSMs, focusing on Mamba, against
Transformer models across various tasks. Our results show that SSMs perform
comparably to Transformers in standard regression ICL tasks, while
outperforming them in tasks like sparse parity learning. However, SSMs fall
short in tasks involving non-standard retrieval functionality. To address these
limitations, we introduce a hybrid model, , that combines Mamba with
attention blocks, surpassing individual models in tasks where they struggle
independently. Our findings suggest that hybrid architectures offer promising
avenues for enhancing ICL in language models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要