Timezone: »
Oral
An Alternative View: When Does SGD Escape Local Minima?
Bobby Kleinberg · Yuanzhi Li · Yang Yuan
Stochastic gradient descent (SGD) is widely used in machine learning. Although being commonly viewed as a fast but not accurate version of gradient descent (GD), it always finds better solutions than GD for modern neural networks. In order to understand this phenomenon, we take an alternative view that SGD is working on the convolved (thus smoothed) version of the loss function. We show that, even if the function $f$ has many bad local minima or saddle points, as long as for every point $x$, the weighted average of the gradients of its neighborhoods is one point convex with respect to the desired solution $x^*$, SGD will get close to, and then stay around $x^*$ with constant probability. Our result identifies a set of functions that SGD provably works, which is much larger than the set of convex functions. Empirically, we observe that the loss surface of neural networks enjoys nice one point convexity properties locally, therefore our theorem helps explain why SGD works so well for neural networks.
Author Information
Bobby Kleinberg (Cornell)
Yuanzhi Li (Princeton University)
Yang Yuan (Cornell University)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: An Alternative View: When Does SGD Escape Local Minima? »
Wed. Jul 11th 04:15 -- 07:00 PM Room Hall B #85
More from the Same Authors
-
2019 Poster: Direct Uncertainty Prediction for Medical Second Opinions »
Maithra Raghu · Katy Blumer · Rory sayres · Ziad Obermeyer · Bobby Kleinberg · Sendhil Mullainathan · Jon Kleinberg -
2019 Oral: Direct Uncertainty Prediction for Medical Second Opinions »
Maithra Raghu · Katy Blumer · Rory sayres · Ziad Obermeyer · Bobby Kleinberg · Sendhil Mullainathan · Jon Kleinberg -
2018 Poster: Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits »
Zeyuan Allen-Zhu · Sebastien Bubeck · Yuanzhi Li -
2018 Poster: Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games? »
Maithra Raghu · Alexander Irpan · Jacob Andreas · Bobby Kleinberg · Quoc Le · Jon Kleinberg -
2018 Oral: Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits »
Zeyuan Allen-Zhu · Sebastien Bubeck · Yuanzhi Li -
2018 Oral: Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games? »
Maithra Raghu · Alexander Irpan · Jacob Andreas · Bobby Kleinberg · Quoc Le · Jon Kleinberg -
2018 Poster: The Well-Tempered Lasso »
Yuanzhi Li · Yoram Singer -
2018 Oral: The Well-Tempered Lasso »
Yuanzhi Li · Yoram Singer -
2017 Poster: Near-Optimal Design of Experiments via Regret Minimization »
Zeyuan Allen-Zhu · Yuanzhi Li · Aarti Singh · Yining Wang -
2017 Talk: Near-Optimal Design of Experiments via Regret Minimization »
Zeyuan Allen-Zhu · Yuanzhi Li · Aarti Singh · Yining Wang -
2017 Poster: Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition »
Zeyuan Allen-Zhu · Yuanzhi Li -
2017 Poster: Faster Principal Component Regression and Stable Matrix Chebyshev Approximation »
Zeyuan Allen-Zhu · Yuanzhi Li -
2017 Talk: Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition »
Zeyuan Allen-Zhu · Yuanzhi Li -
2017 Talk: Faster Principal Component Regression and Stable Matrix Chebyshev Approximation »
Zeyuan Allen-Zhu · Yuanzhi Li -
2017 Poster: Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU »
Zeyuan Allen-Zhu · Yuanzhi Li -
2017 Poster: Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations »
Yuanzhi Li · Yingyu Liang -
2017 Talk: Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations »
Yuanzhi Li · Yingyu Liang -
2017 Talk: Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU »
Zeyuan Allen-Zhu · Yuanzhi Li