Timezone: »
Oral
AdaGrad stepsizes: sharp convergence over nonconvex landscapes
Rachel Ward · Xiaoxia Wu · Leon Bottou
Adaptive gradient methods such as AdaGrad and its variants update the stepsize in stochastic gradient descent on the fly according to the gradients received along the way; such methods have gained widespread use in large-scale optimization for their ability to converge robustly, without the need to fine tune
parameters such as the stepsize schedule. Yet, the theoretical guarantees to date for AdaGrad are for online and convex optimization. We bridge this gap by providing strong theoretical guarantees for the convergence of AdaGrad over smooth, nonconvex landscapes. We show that AdaGrad converges to a stationary point at the optimal $O(1/\sqrt{N})$ rate (up to a $\log(N)$ factor), and at the optimal $O(1/N)$ rate in the non-stochastic setting . In particular, both our theoretical and numerical results imply that AdaGrad is robust to the \emph{unknown Lipschitz constant and level of stochastic noise on the gradient, in a near-optimal sense. }
Author Information
Rachel Ward (University of Texas)
Xiaoxia Wu (The University of Texas at Austin)
The department of mathematics
Leon Bottou (Facebook)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: AdaGrad stepsizes: sharp convergence over nonconvex landscapes »
Wed. Jun 12th 01:30 -- 04:00 AM Room Pacific Ballroom #56
More from the Same Authors
-
2022 : Discussion Panel »
Percy Liang · Léon Bottou · Jayashree Kalpathy-Cramer · Alex Smola -
2022 Poster: Rich Feature Construction for the Optimization-Generalization Dilemma »
Jianyu Zhang · David Lopez-Paz · Léon Bottou -
2022 Spotlight: Rich Feature Construction for the Optimization-Generalization Dilemma »
Jianyu Zhang · David Lopez-Paz · Léon Bottou -
2020 : Q&A with Rachel Ward »
Rachel Ward -
2020 : Talk by Rachel Ward - Weighted Optimization: better generalization by smoother interpolation »
Rachel Ward -
2019 Poster: First-Order Adversarial Vulnerability of Neural Networks and Input Dimension »
Carl-Johann Simon-Gabriel · Yann Ollivier · Leon Bottou · Bernhard Schölkopf · David Lopez-Paz -
2019 Oral: First-Order Adversarial Vulnerability of Neural Networks and Input Dimension »
Carl-Johann Simon-Gabriel · Yann Ollivier · Leon Bottou · Bernhard Schölkopf · David Lopez-Paz -
2017 Poster: Wasserstein Generative Adversarial Networks »
Martin Arjovsky · Soumith Chintala · Léon Bottou -
2017 Talk: Wasserstein Generative Adversarial Networks »
Martin Arjovsky · Soumith Chintala · Léon Bottou