Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems
CoRR(2024)
摘要
Stochastic gradient descent (SGD) exhibits strong algorithmic regularization
effects in practice and plays an important role in the generalization of modern
machine learning. However, prior research has revealed instances where the
generalization performance of SGD is worse than ridge regression due to uneven
optimization along different dimensions. Preconditioning offers a natural
solution to this issue by rebalancing optimization across different directions.
Yet, the extent to which preconditioning can enhance the generalization
performance of SGD and whether it can bridge the existing gap with ridge
regression remains uncertain. In this paper, we study the generalization
performance of SGD with preconditioning for the least squared problem. We make
a comprehensive comparison between preconditioned SGD and (standard &
preconditioned) ridge regression. Our study makes several key contributions
toward understanding and improving SGD with preconditioning. First, we
establish excess risk bounds (generalization performance) for preconditioned
SGD and ridge regression under an arbitrary preconditions matrix. Second,
leveraging the excessive risk characterization of preconditioned SGD and ridge
regression, we show that (through construction) there exists a simple
preconditioned matrix that can outperform (standard & preconditioned) ridge
regression. Finally, we show that our proposed preconditioning matrix is
straightforward enough to allow robust estimation from finite samples while
maintaining a theoretical advantage over ridge regression. Our empirical
results align with our theoretical findings, collectively showcasing the
enhanced regularization effect of preconditioned SGD.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要