The Sample Complexity of Gradient Descent in Stochastic Convex Optimization
arxiv(2024)
摘要
We analyze the sample complexity of full-batch Gradient Descent (GD) in the
setup of non-smooth Stochastic Convex Optimization. We show that the
generalization error of GD, with (minmax) optimal choice of hyper-parameters,
can be Θ̃(d/m + 1/√(m)), where d is the dimension and m is
the sample size. This matches the sample complexity of worst-case
empirical risk minimizers. That means that, in contrast with other algorithms,
GD has no advantage over naive ERMs. Our bound follows from a new
generalization bound that depends on both the dimension as well as the learning
rate and number of iterations. Our bound also shows that, for general
hyper-parameters, when the dimension is strictly larger than number of samples,
T=Ω(1/ϵ^4) iterations are necessary to avoid overfitting. This
resolves an open problem by , and
improves over previous lower bounds that demonstrated that the sample size must
be at least square root of the dimension.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要