Timezone: »
Derivative-free optimization (DFO) has recently gained a lot of momentum in machine learning, spawning interest in the community to design faster methods for problems where gradients are not accessible. While some attention has been given to the concept of acceleration in the DFO literature, existing stochastic algorithms for objective functions with a finite-sum structure have not been shown theoretically to achieve an accelerated rate of convergence. Algorithms that use acceleration in such a setting are prone to instabilities, making it difficult to reach convergence. In this work, we exploit the finite-sum structure of the objective in order to design a variance-reduced DFO algorithm that provably yields acceleration. We prove rates of convergence for both smooth convex and strongly-convex finite-sum objective functions. Finally, we validate our theoretical results empirically on several tasks and datasets.
Author Information
Yuwen Chen (ETH Zurich)
Antonio Orvieto (ETH Zurich)
Aurelien Lucchi (ETH Zurich)
More from the Same Authors
-
2022 : Should You Follow the Gradient Flow? Insights from Runge-Kutta Gradient Descent »
Xiang Li · Antonio Orvieto -
2023 : On the Universality of Linear Recurrences Followed by Nonlinear Projections »
Antonio Orvieto · Soham De · Razvan Pascanu · Caglar Gulcehre · Samuel Smith -
2023 : An Adaptive Method for Minimizing Non-negative Losses »
Antonio Orvieto · Lin Xiao -
2023 : On the Advantage of Lion Compared to signSGD with Momentum »
Alessandro Noiato · Luca Biggio · Antonio Orvieto -
2022 Poster: Anticorrelated Noise Injection for Improved Generalization »
Antonio Orvieto · Hans Kersting · Frank Proske · Francis Bach · Aurelien Lucchi -
2022 Spotlight: Anticorrelated Noise Injection for Improved Generalization »
Antonio Orvieto · Hans Kersting · Frank Proske · Francis Bach · Aurelien Lucchi -
2021 Spotlight: Neural Symbolic Regression that scales »
Luca Biggio · Tommaso Bendinelli · Alexander Neitz · Aurelien Lucchi · Giambattista Parascandolo -
2021 Poster: Neural Symbolic Regression that scales »
Luca Biggio · Tommaso Bendinelli · Alexander Neitz · Aurelien Lucchi · Giambattista Parascandolo -
2020 Poster: Randomized Block-Diagonal Preconditioning for Parallel Learning »
Celestine Mendler-Dünner · Aurelien Lucchi -
2018 Poster: A Distributed Second-Order Algorithm You Can Trust »
Celestine Mendler-Dünner · Aurelien Lucchi · Matilde Gargiani · Yatao Bian · Thomas Hofmann · Martin Jaggi -
2018 Oral: A Distributed Second-Order Algorithm You Can Trust »
Celestine Mendler-Dünner · Aurelien Lucchi · Matilde Gargiani · Yatao Bian · Thomas Hofmann · Martin Jaggi -
2018 Poster: Escaping Saddles with Stochastic Gradients »
Hadi Daneshmand · Jonas Kohler · Aurelien Lucchi · Thomas Hofmann -
2018 Oral: Escaping Saddles with Stochastic Gradients »
Hadi Daneshmand · Jonas Kohler · Aurelien Lucchi · Thomas Hofmann -
2017 Poster: Sub-sampled Cubic Regularization for Non-convex Optimization »
Jonas Kohler · Aurelien Lucchi -
2017 Talk: Sub-sampled Cubic Regularization for Non-convex Optimization »
Jonas Kohler · Aurelien Lucchi