TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models
CoRR(2024)
摘要
KEPLMs are pre-trained models that utilize external knowledge to enhance
language understanding. Previous language models facilitated knowledge
acquisition by incorporating knowledge-related pre-training tasks learned from
relation triples in knowledge graphs. However, these models do not prioritize
learning embeddings for entity-related tokens. Moreover, updating the entire
set of parameters in KEPLMs is computationally demanding. This paper introduces
TRELM, a Robust and Efficient Pre-training framework for Knowledge-Enhanced
Language Models. We observe that entities in text corpora usually follow the
long-tail distribution, where the representations of some entities are
suboptimally optimized and hinder the pre-training process for KEPLMs. To
tackle this, we employ a robust approach to inject knowledge triples and employ
a knowledge-augmented memory bank to capture valuable information. Furthermore,
updating a small subset of neurons in the feed-forward networks (FFNs) that
store factual knowledge is both sufficient and efficient. Specifically, we
utilize dynamic knowledge routing to identify knowledge paths in FFNs and
selectively update parameters during pre-training. Experimental results show
that TRELM reduces pre-training time by at least 50
KEPLMs in knowledge probing tasks and multiple knowledge-aware language
understanding tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要