Timezone: »
Poster
Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy
Blake Woodworth · Konstantin Mishchenko · Francis Bach
We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy. Our algorithm is based on approximate proximal-point iterations on the proxy combined with relatively few stochastic gradients from the objective. When the difference between the objective and the proxy is $\delta$-smooth, our algorithm guarantees convergence at a rate matching stochastic gradient descent on a $\delta$-smooth objective, which can lead to substantially better sample efficiency. Our algorithm has many potential applications in machine learning, and provides a principled means of leveraging synthetic data, physics simulators, mixed public and private data, and more.
Author Information
Blake Woodworth (Inria)
Konstantin Mishchenko (Samsung)
Francis Bach (INRIA - Ecole Normale Supérieure)
More from the Same Authors
-
2023 : Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes »
Konstantin Mishchenko · Slavomír Hanzely · Peter Richtarik -
2023 : Differentiable Clustering and Partial Fenchel-Young Losses »
Lawrence Stewart · Francis Bach · Felipe Llinares-Lopez · Quentin Berthet -
2023 Oral: Learning-Rate-Free Learning by D-Adaptation »
Aaron Defazio · Konstantin Mishchenko -
2023 Poster: On Bridging the Gap between Mean Field and Finite Width Deep Random Multilayer Perceptron with Batch Normalization »
Amir Joudaki · Hadi Daneshmand · Francis Bach -
2023 Poster: Learning-Rate-Free Learning by D-Adaptation »
Aaron Defazio · Konstantin Mishchenko -
2022 Poster: Proximal and Federated Random Reshuffling »
Konstantin Mishchenko · Ahmed Khaled · Peter Richtarik -
2022 Spotlight: Proximal and Federated Random Reshuffling »
Konstantin Mishchenko · Ahmed Khaled · Peter Richtarik -
2022 Poster: Convergence of Uncertainty Sampling for Active Learning »
Anant Raj · Francis Bach -
2022 Poster: ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! »
Konstantin Mishchenko · Grigory Malinovsky · Sebastian Stich · Peter Richtarik -
2022 Spotlight: Convergence of Uncertainty Sampling for Active Learning »
Anant Raj · Francis Bach -
2022 Spotlight: ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! »
Konstantin Mishchenko · Grigory Malinovsky · Sebastian Stich · Peter Richtarik -
2022 Poster: Anticorrelated Noise Injection for Improved Generalization »
Antonio Orvieto · Hans Kersting · Frank Proske · Francis Bach · Aurelien Lucchi -
2022 Spotlight: Anticorrelated Noise Injection for Improved Generalization »
Antonio Orvieto · Hans Kersting · Frank Proske · Francis Bach · Aurelien Lucchi -
2021 : Regularized Newton Method with Global O(1/k^2) Convergence »
Konstantin Mishchenko -
2021 Poster: Disambiguation of Weak Supervision leading to Exponential Convergence rates »
Vivien Cabannnes · Francis Bach · Alessandro Rudi -
2021 Spotlight: Disambiguation of Weak Supervision leading to Exponential Convergence rates »
Vivien Cabannnes · Francis Bach · Alessandro Rudi -
2020 : Q&A with Francis Bach »
Francis Bach -
2020 : Talk by Francis Bach - Second Order Strikes Back - Globally convergent Newton methods for ill-conditioned generalized self-concordant Losses »
Francis Bach -
2020 Poster: Stochastic Optimization for Regularized Wasserstein Estimators »
Marin Ballu · Quentin Berthet · Francis Bach -
2020 Poster: Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization »
Hadrien Hendrikx · Lin Xiao · Sebastien Bubeck · Francis Bach · Laurent Massoulié -
2020 Poster: Adaptive Gradient Descent without Descent »
Yura Malitsky · Konstantin Mishchenko -
2020 Poster: Consistent Structured Prediction with Max-Min Margin Markov Networks »
Alex Nowak · Francis Bach · Alessandro Rudi -
2020 Poster: Structured Prediction with Partial Labelling through the Infimum Loss »
Vivien Cabannnes · Alessandro Rudi · Francis Bach -
2019 Invited Talk: Online Dictionary Learning for Sparse Coding »
Julien Mairal · Francis Bach · Jean Ponce · Guillermo Sapiro -
2018 Poster: A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning »
Konstantin Mishchenko · Franck Iutzeler · Jérôme Malick · Massih-Reza Amini -
2018 Oral: A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning »
Konstantin Mishchenko · Franck Iutzeler · Jérôme Malick · Massih-Reza Amini -
2017 Poster: Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks »
Kevin Scaman · Francis Bach · Sebastien Bubeck · Yin Tat Lee · Laurent Massoulié -
2017 Talk: Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks »
Kevin Scaman · Francis Bach · Sebastien Bubeck · Yin Tat Lee · Laurent Massoulié