Informer: Transformer Likes Informed Attention

arxiv(2021)

引用 0|浏览19
暂无评分
摘要
Transformer is the backbone of modern NLP models. In this paper, we propose Informer, a simple architecture that significantly outperforms canonical Transformers on a spectrum of tasks including Masked Language Modeling, GLUE, and SQuAD. Qualitatively, Informer is easy to implement and requires minimal hyper-parameter tuning. It also stabilizes training and leads to models with sparser attentions. Code will be open-sourced upon paper acceptance.
更多
查看译文
关键词
transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要