Predicting the Performance of Foundation Models via Agreement-on-the-Line
arxiv(2024)
摘要
Estimating the out-of-distribution performance in regimes where labels are
scarce is critical to safely deploy foundation models. Recently, it was shown
that ensembles of neural networks observe the phenomena
“agreement-on-the-line”, which can be leveraged to reliably predict OOD
performance without labels. However, in contrast to classical neural networks
that are trained on in-distribution data from scratch for numerous epochs,
foundation models undergo minimal finetuning from heavily pretrained weights,
which may reduce the ensemble diversity needed to observe
agreement-on-the-line. In our work, we demonstrate that when lightly finetuning
multiple runs from a single foundation model, the choice of
randomness during training (linear head initialization, data ordering, and data
subsetting) can lead to drastically different levels of agreement-on-the-line
in the resulting ensemble. Surprisingly, only random head initialization is
able to reliably induce agreement-on-the-line in finetuned foundation models
across vision and language benchmarks. Second, we demonstrate that ensembles of
multiple foundation models pretrained on different datasets but
finetuned on the same task can also show agreement-on-the-line. In total, by
careful construction of a diverse ensemble, we can utilize
agreement-on-the-line-based methods to predict the OOD performance of
foundation models with high precision.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要