Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect.

Lorenzo Noci,Kevin Roth,Gregor Bachmann,Sebastian Nowozin,Thomas Hofmann

Annual Conference on Neural Information Processing Systems（2021）

引用 14|浏览22

暂无评分

摘要

The “cold posterior effect” (CPE) in Bayesian deep learning describes the disturbing observation that the predictive performance of Bayesian neural networks can be significantly improved if the Bayes posterior is artificially sharpened using a temperature parameter T <1. The CPE is problematic in theory and practice and since the effect was identified many researchers have proposed hypotheses to explain the phenomenon. However, despite this intensive research effort the effect remains poorly understood. In this work we provide novel and nuanced evidence relevant to existing explanations for the cold posterior effect, disentangling three hypotheses: 1. The dataset curation hypothesis of Aitchison (2020): we show empirically that the CPE does not arise in a real curated data set but can be produced in a controlled experiment with varying curation strength. 2. The data augmentation hypothesis of Izmailov et al. (2021) and Fortuin et al. (2021): we show empirically that data augmentation is sufficient but not necessary for the CPE to be present. 3. The bad prior hypothesis of Wenzel et al. (2020): we use a simple experiment evaluating the relative importance of the prior and the likelihood, strongly linking the CPE to the prior. Our results demonstrate how the CPE can arise in isolation from synthetic curation, data augmentation, and bad priors. Cold posteriors observed “in the wild” are therefore unlikely to arise from a single simple cause; as a result, we do not expect a simple “fix” for cold posteriors.

查看译文

关键词

prior,effect,cold,curation,data-augmentation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要