The Sample Complexity of Gradient Descent in Stochastic Convex Optimization

arxiv（2024）

引用 0|浏览3

暂无评分

摘要

We analyze the sample complexity of full-batch Gradient Descent (GD) in the setup of non-smooth Stochastic Convex Optimization. We show that the generalization error of GD, with (minmax) optimal choice of hyper-parameters, can be Θ̃(d/m + 1/√(m)), where d is the dimension and m is the sample size. This matches the sample complexity of worst-case empirical risk minimizers. That means that, in contrast with other algorithms, GD has no advantage over naive ERMs. Our bound follows from a new generalization bound that depends on both the dimension as well as the learning rate and number of iterations. Our bound also shows that, for general hyper-parameters, when the dimension is strictly larger than number of samples, T=Ω(1/ϵ^4) iterations are necessary to avoid overfitting. This resolves an open problem by , and improves over previous lower bounds that demonstrated that the sample size must be at least square root of the dimension.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要