Variational Learning is Effective for Large Deep Networks
CoRR(2024)
摘要
We give extensive empirical evidence against the common belief that
variational learning is ineffective for large neural networks. We show that an
optimizer called Improved Variational Online Newton (IVON) consistently matches
or outperforms Adam for training large networks such as GPT-2 and ResNets from
scratch. IVON's computational costs are nearly identical to Adam but its
predictive uncertainty is better. We show several new use cases of IVON where
we improve fine-tuning and model merging in Large Language Models, accurately
predict generalization error, and faithfully estimate sensitivity to data. We
find overwhelming evidence in support of effectiveness of variational learning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要