Timezone: »
Gaussian noise injections (GNIs) are a family of simple and widely-used regularisation methods for training neural networks, where one injects additive or multiplicative Gaussian noise to the network activations at every iteration of the optimisation algorithm, which is typically chosen as stochastic gradient descent (SGD). In this paper, we focus on the so-called implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of SGD. We show that this effect induces an \emph{asymmetric heavy-tailed noise} on SGD gradient updates. In order to model this modified dynamics, we first develop a Langevin-like stochastic differential equation that is driven by a general family of \emph{asymmetric} heavy-tailed noise. Using this model we then formally prove that GNIs induce an
implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry. Our empirical results confirm that different types of neural networks trained with GNIs are well-modelled by the proposed dynamics and that the implicit effect of these injections induces a bias that degrades the performance of networks.
Author Information
Alexander D Camuto (University of Oxford)
Xiaoyu Wang (Florida State University)
Lingjiong Zhu (Florida State University)
Christopher Holmes (University of Oxford)
Mert Gurbuzbalaban (Rutgers University)
Umut Simsekli (Inria/ENS)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections »
Thu. Jul 22nd 12:20 -- 12:25 AM Room
More from the Same Authors
-
2023 Poster: Generalization Bounds using Data-Dependent Fractal Dimensions »
Benjamin Dupuis · George Deligiannidis · Umut Simsekli -
2023 Poster: Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions »
Anant Raj · Lingjiong Zhu · Mert Gurbuzbalaban · Umut Simsekli -
2023 Poster: PWSHAP: A Path-Wise Explanation Model for Targeted Variables »
Lucile Ter-Minassian · Oscar Clivio · Karla DiazOrdaz · Robin Evans · Christopher Holmes -
2022 Poster: Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers »
Liam Hodgkinson · Umut Simsekli · Rajiv Khanna · Michael Mahoney -
2022 Spotlight: Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers »
Liam Hodgkinson · Umut Simsekli · Rajiv Khanna · Michael Mahoney -
2021 Poster: The Heavy-Tail Phenomenon in SGD »
Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2021 Spotlight: The Heavy-Tail Phenomenon in SGD »
Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2021 Poster: Relative Positional Encoding for Transformers with Linear Complexity »
Antoine Liutkus · Ondřej Cífka · Shih-Lun Wu · Umut Simsekli · Yi-Hsuan Yang · Gaël RICHARD -
2021 Oral: Relative Positional Encoding for Transformers with Linear Complexity »
Antoine Liutkus · Ondřej Cífka · Shih-Lun Wu · Umut Simsekli · Yi-Hsuan Yang · Gaël RICHARD -
2020 Poster: Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise »
Umut Simsekli · Lingjiong Zhu · Yee-Whye Teh · Mert Gurbuzbalaban -
2019 Poster: A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks »
Umut Simsekli · Levent Sagun · Mert Gurbuzbalaban -
2019 Oral: A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks »
Umut Simsekli · Levent Sagun · Mert Gurbuzbalaban -
2019 Poster: Scalable Nonparametric Sampling from Multimodal Posteriors with the Posterior Bootstrap »
Edwin Fong · Simon Lyddon · Christopher Holmes -
2019 Poster: Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances »
Bugra Can · Mert Gurbuzbalaban · Lingjiong Zhu -
2019 Oral: Scalable Nonparametric Sampling from Multimodal Posteriors with the Posterior Bootstrap »
Edwin Fong · Simon Lyddon · Christopher Holmes -
2019 Oral: Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances »
Bugra Can · Mert Gurbuzbalaban · Lingjiong Zhu -
2018 Poster: Probabilistic Boolean Tensor Decomposition »
Tammo Rukat · Christopher Holmes · Christopher Yau -
2018 Oral: Probabilistic Boolean Tensor Decomposition »
Tammo Rukat · Christopher Holmes · Christopher Yau -
2017 Poster: Bayesian Boolean Matrix Factorisation »
Tammo Rukat · Christopher Holmes · Michalis Titsias · Christopher Yau -
2017 Talk: Bayesian Boolean Matrix Factorisation »
Tammo Rukat · Christopher Holmes · Michalis Titsias · Christopher Yau