Timezone: »
In practical applications of iterative first-order optimization, the learning rate schedule remains notoriously difficult to understand and expensive to tune. We demonstrate the presence of these subtleties even in the innocuous case when the objective is a convex quadratic. We reinterpret an iterative algorithm from the numerical analysis literature as what we call the Chebyshev learning rate schedule for accelerating vanilla gradient descent, and show that the problem of mitigating instability leads to a fractal ordering of step sizes. We provide some experiments to challenge conventional beliefs about stable learning rates in deep learning: the fractal schedule enables training to converge with locally unstable updates which make negative progress on the objective.
Author Information
Naman Agarwal (Google Research)
Surbhi Goel (Microsoft Research)
Cyril Zhang (Microsoft Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Acceleration via Fractal Learning Rate Schedules »
Tue. Jul 20th 01:35 -- 01:40 PM Room
More from the Same Authors
-
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Social: Mental Health in ML Academia »
Paula Gradu · Cyril Zhang -
2022 Poster: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Poster: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Spotlight: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Spotlight: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Poster: Inductive Biases and Variable Creation in Self-Attention Mechanisms »
Benjamin Edelman · Surbhi Goel · Sham Kakade · Cyril Zhang -
2022 Spotlight: Inductive Biases and Variable Creation in Self-Attention Mechanisms »
Benjamin Edelman · Surbhi Goel · Sham Kakade · Cyril Zhang -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2021 Poster: Statistical Estimation from Dependent Data »
Vardis Kandiros · Yuval Dagan · Nishanth Dikkala · Surbhi Goel · Constantinos Daskalakis -
2021 Spotlight: Statistical Estimation from Dependent Data »
Vardis Kandiros · Yuval Dagan · Nishanth Dikkala · Surbhi Goel · Constantinos Daskalakis -
2021 Poster: A Regret Minimization Approach to Iterative Learning Control »
Naman Agarwal · Elad Hazan · Anirudha Majumdar · Karan Singh -
2021 Spotlight: A Regret Minimization Approach to Iterative Learning Control »
Naman Agarwal · Elad Hazan · Anirudha Majumdar · Karan Singh -
2020 Poster: Learning Mixtures of Graphs from Epidemic Cascades »
Jessica Hoffmann · Soumya Basu · Surbhi Goel · Constantine Caramanis -
2020 Poster: Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent »
Surbhi Goel · Aravind Gollakota · Zhihan Jin · Sushrut Karmalkar · Adam Klivans -
2020 Poster: Efficiently Learning Adversarially Robust Halfspaces with Noise »
Omar Montasser · Surbhi Goel · Ilias Diakonikolas · Nati Srebro -
2020 Poster: Boosting for Control of Dynamical Systems »
Naman Agarwal · Nataly Brukhim · Elad Hazan · Zhou Lu -
2019 Poster: Efficient Full-Matrix Adaptive Regularization »
Naman Agarwal · Brian Bullins · Xinyi Chen · Elad Hazan · Karan Singh · Cyril Zhang · Yi Zhang -
2019 Poster: Online Control with Adversarial Disturbances »
Naman Agarwal · Brian Bullins · Elad Hazan · Sham Kakade · Karan Singh -
2019 Oral: Efficient Full-Matrix Adaptive Regularization »
Naman Agarwal · Brian Bullins · Xinyi Chen · Elad Hazan · Karan Singh · Cyril Zhang · Yi Zhang -
2019 Oral: Online Control with Adversarial Disturbances »
Naman Agarwal · Brian Bullins · Elad Hazan · Sham Kakade · Karan Singh -
2018 Poster: Learning One Convolutional Layer with Overlapping Patches »
Surbhi Goel · Adam Klivans · Raghu Meka -
2018 Oral: Learning One Convolutional Layer with Overlapping Patches »
Surbhi Goel · Adam Klivans · Raghu Meka