Timezone: »
Spotlight
Understanding the unstable convergence of gradient descent
Kwangjun Ahn · Jingzhao Zhang · Suvrit Sra
Most existing analyses of (stochastic) gradient descent rely on the condition that for $L$-smooth costs, the step size is less than $2/L$. However, many works have observed that in machine learning applications step sizes often do not fulfill this condition, yet (stochastic) gradient descent still converges, albeit in an unstable manner. We investigate this unstable convergence phenomenon from first principles, and discuss key causes behind it. We also identify its main characteristics, and how they interrelate based on both theory and experiments, offering a principled view toward understanding the phenomenon.
Author Information
Kwangjun Ahn (MIT EECS)
Jingzhao Zhang (Tsinghua University)
Suvrit Sra (MIT & Macro-Eyes)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Understanding the unstable convergence of gradient descent »
Tue. Jul 19th through Wed the 20th Room Hall E #607
More from the Same Authors
-
2023 Poster: Global optimality for Euclidean CCCP under Riemannian convexity »
Melanie Weber · Suvrit Sra -
2023 Poster: On the Training Instability of Shuffling SGD with Batch Normalization »
David X. Wu · Chulhee Yun · Suvrit Sra -
2022 : Sign and Basis Invariant Networks for Spectral Graph Representation Learning »
Derek Lim · Joshua Robinson · Lingxiao Zhao · Tess Smidt · Suvrit Sra · Haggai Maron · Stefanie Jegelka -
2022 Poster: Agnostic Learnability of Halfspaces via Logistic Loss »
Ziwei Ji · Kwangjun Ahn · Pranjal Awasthi · Satyen Kale · Stefani Karp -
2022 Poster: Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity »
Jingzhao Zhang · Hongzhou Lin · Subhro Das · Suvrit Sra · Ali Jadbabaie -
2022 Oral: Agnostic Learnability of Halfspaces via Logistic Loss »
Ziwei Ji · Kwangjun Ahn · Pranjal Awasthi · Satyen Kale · Stefani Karp -
2022 Spotlight: Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity »
Jingzhao Zhang · Hongzhou Lin · Subhro Das · Suvrit Sra · Ali Jadbabaie -
2022 Poster: Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective »
Jingzhao Zhang · Haochuan Li · Suvrit Sra · Ali Jadbabaie -
2022 Spotlight: Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective »
Jingzhao Zhang · Haochuan Li · Suvrit Sra · Ali Jadbabaie -
2021 Poster: Provably Efficient Algorithms for Multi-Objective Competitive RL »
Tiancheng Yu · Yi Tian · Jingzhao Zhang · Suvrit Sra -
2021 Poster: Online Learning in Unknown Markov Games »
Yi Tian · Yuanhao Wang · Tiancheng Yu · Suvrit Sra -
2021 Spotlight: Online Learning in Unknown Markov Games »
Yi Tian · Yuanhao Wang · Tiancheng Yu · Suvrit Sra -
2021 Oral: Provably Efficient Algorithms for Multi-Objective Competitive RL »
Tiancheng Yu · Yi Tian · Jingzhao Zhang · Suvrit Sra -
2021 Poster: Three Operator Splitting with a Nonconvex Loss Function »
Alp Yurtsever · Varun Mangalick · Suvrit Sra -
2021 Spotlight: Three Operator Splitting with a Nonconvex Loss Function »
Alp Yurtsever · Varun Mangalick · Suvrit Sra -
2020 Poster: Strength from Weakness: Fast Learning Using Weak Supervision »
Joshua Robinson · Stefanie Jegelka · Suvrit Sra -
2020 Poster: Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions »
Jingzhao Zhang · Hongzhou Lin · Stefanie Jegelka · Suvrit Sra · Ali Jadbabaie -
2020 Poster: Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition »
Chi Jin · Tiancheng Jin · Haipeng Luo · Suvrit Sra · Tiancheng Yu -
2019 Poster: Escaping Saddle Points with Adaptive Gradient Methods »
Matthew Staib · Sashank Jakkam Reddi · Satyen Kale · Sanjiv Kumar · Suvrit Sra -
2019 Oral: Escaping Saddle Points with Adaptive Gradient Methods »
Matthew Staib · Sashank Jakkam Reddi · Satyen Kale · Sanjiv Kumar · Suvrit Sra -
2019 Poster: Random Shuffling Beats SGD after Finite Epochs »
Jeff HaoChen · Suvrit Sra -
2019 Oral: Random Shuffling Beats SGD after Finite Epochs »
Jeff HaoChen · Suvrit Sra -
2019 Poster: Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator »
Alp Yurtsever · Suvrit Sra · Volkan Cevher -
2019 Oral: Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator »
Alp Yurtsever · Suvrit Sra · Volkan Cevher