Timezone: »
Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models. Usually, uncorrelated noise is used in such perturbed gradient descent (PGD) methods. It is, however, not known if this is optimal or whether other types of noise could provide better generalization performance. In this paper, we zoom in on the problem of correlating the perturbations of consecutive PGD steps. We consider a variety of objective functions for which we find that GD with anticorrelated perturbations ("Anti-PGD") generalizes significantly better than GD and standard (uncorrelated) PGD. To support these experimental findings, we also derive a theoretical analysis that demonstrates that Anti-PGD moves to wider minima, while GD and PGD remain stuck in suboptimal regions or even diverge. This new connection between anticorrelated noise and generalization opens the field to novel ways to exploit noise for training machine learning models.
Author Information
Antonio Orvieto (ETH Zurich)
Hans Kersting (INRIA and ENS Paris)
Frank Proske (University of Oslo)
Francis Bach (INRIA - Ecole Normale Supérieure)
Aurelien Lucchi (ETH Zurich)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Anticorrelated Noise Injection for Improved Generalization »
Tue. Jul 19th through Wed the 20th Room Hall E #710
More from the Same Authors
-
2022 : Should You Follow the Gradient Flow? Insights from Runge-Kutta Gradient Descent »
Xiang Li · Antonio Orvieto -
2023 : On the Universality of Linear Recurrences Followed by Nonlinear Projections »
Antonio Orvieto · Soham De · Razvan Pascanu · Caglar Gulcehre · Samuel Smith -
2023 : An Adaptive Method for Minimizing Non-negative Losses »
Antonio Orvieto · Lin Xiao -
2023 : On the Advantage of Lion Compared to signSGD with Momentum »
Alessandro Noiato · Luca Biggio · Antonio Orvieto -
2023 : Differentiable Clustering and Partial Fenchel-Young Losses »
Lawrence Stewart · Francis Bach · Felipe Llinares-Lopez · Quentin Berthet -
2023 Poster: On Bridging the Gap between Mean Field and Finite Width Deep Random Multilayer Perceptron with Batch Normalization »
Amir Joudaki · Hadi Daneshmand · Francis Bach -
2023 Poster: An SDE for Modeling SAM: Theory and Insights »
Enea Monzio Compagnoni · Luca Biggio · Antonio Orvieto · Frank Proske · Hans Kersting · Aurelien Lucchi -
2023 Poster: Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy »
Blake Woodworth · Konstantin Mishchenko · Francis Bach -
2023 Poster: A Theoretical Analysis of the Learning Dynamics under Class Imbalance »
Emanuele Francazi · Marco Baity-Jesi · Aurelien Lucchi -
2022 Poster: Convergence of Uncertainty Sampling for Active Learning »
Anant Raj · Francis Bach -
2022 Spotlight: Convergence of Uncertainty Sampling for Active Learning »
Anant Raj · Francis Bach -
2021 Poster: Disambiguation of Weak Supervision leading to Exponential Convergence rates »
Vivien Cabannnes · Francis Bach · Alessandro Rudi -
2021 Spotlight: Disambiguation of Weak Supervision leading to Exponential Convergence rates »
Vivien Cabannnes · Francis Bach · Alessandro Rudi -
2020 : Q&A with Francis Bach »
Francis Bach -
2020 : Talk by Francis Bach - Second Order Strikes Back - Globally convergent Newton methods for ill-conditioned generalized self-concordant Losses »
Francis Bach -
2020 Poster: Stochastic Optimization for Regularized Wasserstein Estimators »
Marin Ballu · Quentin Berthet · Francis Bach -
2020 Poster: Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization »
Hadrien Hendrikx · Lin Xiao · Sebastien Bubeck · Francis Bach · Laurent Massoulié -
2020 Poster: An Accelerated DFO Algorithm for Finite-sum Convex Functions »
Yuwen Chen · Antonio Orvieto · Aurelien Lucchi -
2020 Poster: Consistent Structured Prediction with Max-Min Margin Markov Networks »
Alex Nowak · Francis Bach · Alessandro Rudi -
2020 Poster: Structured Prediction with Partial Labelling through the Infimum Loss »
Vivien Cabannnes · Alessandro Rudi · Francis Bach -
2019 Invited Talk: Online Dictionary Learning for Sparse Coding »
Julien Mairal · Francis Bach · Jean Ponce · Guillermo Sapiro -
2017 Poster: Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks »
Kevin Scaman · Francis Bach · Sebastien Bubeck · Yin Tat Lee · Laurent Massoulié -
2017 Talk: Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks »
Kevin Scaman · Francis Bach · Sebastien Bubeck · Yin Tat Lee · Laurent Massoulié