Timezone: »
Oral
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Umut Simsekli · Levent Sagun · Mert Gurbuzbalaban
The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the classical central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Gaussianity assumption might fail to hold in deep learning settings and hence render the Brownian motion-based analyses inappropriate. Inspired by non-Gaussian natural phenomena, we consider the GN in a more general context and invoke the generalized CLT (GCLT), which suggests that the GN converges to a heavy-tailed $\alpha$-stable random variable. Accordingly, we propose to analyze SGD as an SDE driven by a L\'{e}vy motion. Such SDEs can incur `jumps', which force the SDE transition from narrow minima to wider minima, as proven by existing metastability theory. To validate the $\alpha$-stable assumption, we conduct experiments on common deep learning scenarios and show that in all settings, the GN is highly non-Gaussian and admits heavy-tails. We investigate the tail behavior in varying network architectures and sizes, loss functions, and datasets. Our results open up a different perspective and shed more light on the belief that SGD prefers wide minima.
Author Information
Umut Simsekli (Telecom ParisTech)
Levent Sagun (CEA)
Mert Gurbuzbalaban (Rutgers University)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks »
Fri. Jun 14th 01:30 -- 04:00 AM Room Pacific Ballroom #76
More from the Same Authors
-
2021 Poster: Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections »
Alexander D Camuto · Xiaoyu Wang · Lingjiong Zhu · Christopher Holmes · Mert Gurbuzbalaban · Umut Simsekli -
2021 Spotlight: Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections »
Alexander D Camuto · Xiaoyu Wang · Lingjiong Zhu · Christopher Holmes · Mert Gurbuzbalaban · Umut Simsekli -
2021 Poster: The Heavy-Tail Phenomenon in SGD »
Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2021 Spotlight: The Heavy-Tail Phenomenon in SGD »
Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2020 Poster: Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise »
Umut Simsekli · Lingjiong Zhu · Yee-Whye Teh · Mert Gurbuzbalaban -
2019 Poster: Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization »
Thanh Huy Nguyen · Umut Simsekli · Gaël RICHARD -
2019 Poster: Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions »
Antoine Liutkus · Umut Simsekli · Szymon Majewski · Alain Durmus · Fabian-Robert Stöter -
2019 Oral: Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization »
Thanh Huy Nguyen · Umut Simsekli · Gaël RICHARD -
2019 Oral: Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions »
Antoine Liutkus · Umut Simsekli · Szymon Majewski · Alain Durmus · Fabian-Robert Stöter -
2019 Poster: Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances »
Bugra Can · Mert Gurbuzbalaban · Lingjiong Zhu -
2019 Oral: Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances »
Bugra Can · Mert Gurbuzbalaban · Lingjiong Zhu -
2018 Poster: Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization »
Umut Simsekli · Cagatay Yildiz · Thanh Huy Nguyen · Ali Taylan Cemgil · Gaël RICHARD -
2018 Oral: Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization »
Umut Simsekli · Cagatay Yildiz · Thanh Huy Nguyen · Ali Taylan Cemgil · Gaël RICHARD -
2017 Poster: Fractional Langevin Monte Carlo: Exploring Levy Driven Stochastic Differential Equations for MCMC »
Umut Simsekli -
2017 Talk: Fractional Langevin Monte Carlo: Exploring Levy Driven Stochastic Differential Equations for MCMC »
Umut Simsekli