From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction
arxiv(2023)
摘要
Foundation models have been transformational in machine learning fields such
as natural language processing and computer vision. Similar success in atomic
property prediction has been limited due to the challenges of training
effective models across multiple chemical domains. To address this, we
introduce Joint Multi-domain Pre-training (JMP), a supervised pre-training
strategy that simultaneously trains on multiple datasets from different
chemical domains, treating each dataset as a unique pre-training task within a
multi-task framework. Our combined training dataset consists of ∼120M
systems from OC20, OC22, ANI-1x, and Transition-1x. We evaluate performance and
generalization by fine-tuning over a diverse set of downstream tasks and
datasets including: QM9, rMD17, MatBench, QMOF, SPICE, and MD22. JMP
demonstrates an average improvement of 59
matches or sets state-of-the-art on 34 out of 40 tasks. Our work highlights the
potential of pre-training strategies that utilize diverse data to advance
property prediction across chemical domains, especially for low-data tasks.
Please visit https://nima.sh/jmp for further information.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要