Mixture Data for Training Cannot Ensure Out-of-distribution Generalization
arxiv(2023)
摘要
Deep neural networks often face generalization problems to handle
out-of-distribution (OOD) data, and there remains a notable theoretical gap
between the contributing factors and their respective impacts. Literature
evidence from in-distribution data has suggested that generalization error can
shrink if the size of mixture data for training increases. However, when it
comes to OOD samples, this conventional understanding does not hold anymore –
Increasing the size of training data does not always lead to a reduction in the
test generalization error. In fact, diverse trends of the errors have been
found across various shifting scenarios including those decreasing trends under
a power-law pattern, initial declines followed by increases, or continuous
stable patterns. Previous work has approached OOD data qualitatively, treating
them merely as samples unseen during training, which are hard to explain the
complicated non-monotonic trends. In this work, we quantitatively redefine OOD
data as those situated outside the convex hull of mixed training data and
establish novel generalization error bounds to comprehend the counterintuitive
observations better. Our proof of the new risk bound agrees that the efficacy
of well-trained models can be guaranteed for unseen data within the convex
hull; More interestingly, but for OOD data beyond this coverage, the
generalization cannot be ensured, which aligns with our observations.
Furthermore, we attempted various OOD techniques to underscore that our results
not only explain insightful observations in recent OOD generalization work,
such as the significance of diverse data and the sensitivity to unseen shifts
of existing algorithms, but it also inspires a novel and effective data
selection strategy.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要