Timezone: »
Poster
AdaGrad stepsizes: sharp convergence over nonconvex landscapes
Rachel Ward · Xiaoxia Wu · Leon Bottou
Adaptive gradient methods such as AdaGrad and its variants update the stepsize in stochastic gradient descent on the fly according to the gradients received along the way; such methods have gained widespread use in large-scale optimization for their ability to converge robustly, without the need to fine-tune
parameters such as the stepsize schedule. Yet, the theoretical guarantees to date for AdaGrad are for online and convex optimization. We bridge this gap by providing strong theoretical guarantees for the convergence of AdaGrad over smooth, nonconvex landscapes. We show that the norm version of AdaGrad (AdaGrad-Norm) converges to a stationary point at the $\mathcal{O}(\log(N)/\sqrt{N})$ rate in the stochastic setting, and at the optimal $\mathcal{O}(1/N)$ rate in the batch (non-stochastic) setting -- in this sense, our convergence guarantees are ``sharp''. In particular, both our theoretical results and extensive numerical experiments imply that AdaGrad-Norm is robust to the \emph{unknown Lipschitz constant and level of stochastic noise on the gradient}.
Author Information
Rachel Ward (University of Texas)
Xiaoxia Wu (The University of Texas at Austin)
The department of mathematics
Leon Bottou (Facebook)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: AdaGrad stepsizes: sharp convergence over nonconvex landscapes »
Tue. Jun 11th through Wed the 12th Room Hall B
More from the Same Authors
-
2022 : Discussion Panel »
Percy Liang · Léon Bottou · Jayashree Kalpathy-Cramer · Alex Smola -
2022 Poster: Rich Feature Construction for the Optimization-Generalization Dilemma »
Jianyu Zhang · David Lopez-Paz · Léon Bottou -
2022 Spotlight: Rich Feature Construction for the Optimization-Generalization Dilemma »
Jianyu Zhang · David Lopez-Paz · Léon Bottou -
2020 : Q&A with Rachel Ward »
Rachel Ward -
2020 : Talk by Rachel Ward - Weighted Optimization: better generalization by smoother interpolation »
Rachel Ward -
2019 Poster: First-Order Adversarial Vulnerability of Neural Networks and Input Dimension »
Carl-Johann Simon-Gabriel · Yann Ollivier · Leon Bottou · Bernhard Schölkopf · David Lopez-Paz -
2019 Oral: First-Order Adversarial Vulnerability of Neural Networks and Input Dimension »
Carl-Johann Simon-Gabriel · Yann Ollivier · Leon Bottou · Bernhard Schölkopf · David Lopez-Paz -
2017 Poster: Wasserstein Generative Adversarial Networks »
Martin Arjovsky · Soumith Chintala · Léon Bottou -
2017 Talk: Wasserstein Generative Adversarial Networks »
Martin Arjovsky · Soumith Chintala · Léon Bottou