Informer: Transformer Likes Informed Attention

Ruining He,Anirudh Ravula,Bhargav Kanagal,Joshua Ainslie

arxiv（2021）

引用 0|浏览19

暂无评分

摘要

Transformer is the backbone of modern NLP models. In this paper, we propose Informer, a simple architecture that significantly outperforms canonical Transformers on a spectrum of tasks including Masked Language Modeling, GLUE, and SQuAD. Qualitatively, Informer is easy to implement and requires minimal hyper-parameter tuning. It also stabilizes training and leads to models with sparser attentions. Code will be open-sourced upon paper acceptance.

查看译文

关键词

transformer

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要