Timezone: »
The speed of gradient descent for convex Lipschitz functions is highly dependent on the choice of learning rate. Setting the learning rate to achieve the optimal convergence rate requires knowing the distance D from the initial point to the solution set. In this work, we describe a single-loop method, with no back-tracking or line searches, which does not require knowledge of D yet asymptotically achieves the optimal rate of convergence for the complexity class of convex Lipschitz functions. Our approach is the first parameter-free method for this class without additional multiplicative log factors in the convergence rate. We present extensive experiments for SGD and Adam variants of our method, where the method automatically matches hand-tuned learning rates across more than a dozen diverse machine learning problems, including large-scale vision and language problems. Our method is practical, efficient and requires no additional function value or gradient evaluations each step. An implementation is provided in the supplementary material.
Author Information
Aaron Defazio (FAIR - Meta AI)
Konstantin Mishchenko (Samsung)
Related Events (a corresponding poster, oral, or spotlight)
-
2023 Poster: Learning-Rate-Free Learning by D-Adaptation »
Wed. Jul 26th 12:00 -- 01:30 AM Room Exhibit Hall 1 #233
More from the Same Authors
-
2023 : Hessian Inertia in Neural Networks »
Xuchan Bao · Alberto Bietti · Aaron Defazio · Vivien Cabannnes -
2023 : Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes »
Konstantin Mishchenko · Slavomír Hanzely · Peter Richtarik -
2023 Poster: Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy »
Blake Woodworth · Konstantin Mishchenko · Francis Bach -
2022 Poster: Proximal and Federated Random Reshuffling »
Konstantin Mishchenko · Ahmed Khaled · Peter Richtarik -
2022 Spotlight: Proximal and Federated Random Reshuffling »
Konstantin Mishchenko · Ahmed Khaled · Peter Richtarik -
2022 Poster: ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! »
Konstantin Mishchenko · Grigory Malinovsky · Sebastian Stich · Peter Richtarik -
2022 Spotlight: ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! »
Konstantin Mishchenko · Grigory Malinovsky · Sebastian Stich · Peter Richtarik -
2021 : Regularized Newton Method with Global O(1/k^2) Convergence »
Konstantin Mishchenko -
2020 Poster: Adaptive Gradient Descent without Descent »
Yura Malitsky · Konstantin Mishchenko -
2018 Poster: A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning »
Konstantin Mishchenko · Franck Iutzeler · Jérôme Malick · Massih-Reza Amini -
2018 Oral: A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning »
Konstantin Mishchenko · Franck Iutzeler · Jérôme Malick · Massih-Reza Amini