Timezone: »
We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) don't increase the stepsize too fast and 2) don't overstep the local curvature. No need for functional values, no line search, no information about the function except for the gradients. By following these rules, you get a method adaptive to the local geometry, with convergence guarantees depending only on smoothness in a neighborhood of a solution. Given that the problem is convex, our method will converge even if the global smoothness constant is infinity. As an illustration, it can minimize arbitrary continuously twice-differentiable convex function. We examine its performance on a range of convex and nonconvex problems, including logistic regression and matrix factorization.
Author Information
Yura Malitsky (EPFL)
Konstantin Mishchenko (King Abdullah University of Science & Technology (KAUST))
More from the Same Authors
-
2021 : Regularized Newton Method with Global O(1/k^2) Convergence »
Konstantin Mishchenko -
2020 Poster: A new regret analysis for Adam-type algorithms »
Ahmet Alacaoglu · Yura Malitsky · Panayotis Mertikopoulos · Volkan Cevher -
2018 Poster: A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning »
Konstantin Mishchenko · Franck Iutzeler · Jérôme Malick · Massih-Reza Amini -
2018 Oral: A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning »
Konstantin Mishchenko · Franck Iutzeler · Jérôme Malick · Massih-Reza Amini