Timezone: »
Poster
The Heavy-Tail Phenomenon in SGD
Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu
In recent years, various notions of capacity and complexity have been proposed for characterizing the generalization properties of stochastic gradient descent (SGD) in deep learning. Some of the popular notions that correlate well with the performance on unseen data are (i) the `flatness' of the local minimum found by SGD, which is related to the eigenvalues of the Hessian, (ii) the ratio of the stepsize $\eta$ to the batch-size $b$, which essentially controls the magnitude of the stochastic gradient noise, and (iii) the `tail-index', which measures the heaviness of the tails of the network weights at convergence. In this paper, we argue that these three seemingly unrelated perspectives for generalization are deeply linked to each other. We claim that depending on the structure of the Hessian of the loss at the minimum, and the choices of the algorithm parameters $\eta$ and $b$, the SGD iterates will converge to a \emph{heavy-tailed} stationary distribution. We rigorously prove this claim in the setting of quadratic optimization: we show that even in a simple linear regression problem with independent and identically distributed data whose distribution has finite moments of all order, the iterates can be heavy-tailed with infinite variance. We further characterize the behavior of the tails with respect to algorithm parameters, the dimension, and the curvature. We then translate our results into insights about the behavior of SGD in deep learning. We support our theory with experiments conducted on synthetic data, fully connected, and convolutional neural networks.
Author Information
Mert Gurbuzbalaban (Rutgers University)
Umut Simsekli (Inria/ENS)
Lingjiong Zhu (Florida State University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: The Heavy-Tail Phenomenon in SGD »
Wed. Jul 21st 01:20 -- 01:25 AM Room
More from the Same Authors
-
2023 Poster: Generalization Bounds using Data-Dependent Fractal Dimensions »
Benjamin Dupuis · George Deligiannidis · Umut Simsekli -
2023 Poster: Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions »
Anant Raj · Lingjiong Zhu · Mert Gurbuzbalaban · Umut Simsekli -
2022 Poster: Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers »
Liam Hodgkinson · Umut Simsekli · Rajiv Khanna · Michael Mahoney -
2022 Spotlight: Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers »
Liam Hodgkinson · Umut Simsekli · Rajiv Khanna · Michael Mahoney -
2021 Poster: Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections »
Alexander D Camuto · Xiaoyu Wang · Lingjiong Zhu · Christopher Holmes · Mert Gurbuzbalaban · Umut Simsekli -
2021 Spotlight: Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections »
Alexander D Camuto · Xiaoyu Wang · Lingjiong Zhu · Christopher Holmes · Mert Gurbuzbalaban · Umut Simsekli -
2021 Poster: Relative Positional Encoding for Transformers with Linear Complexity »
Antoine Liutkus · Ondřej Cífka · Shih-Lun Wu · Umut Simsekli · Yi-Hsuan Yang · Gaël RICHARD -
2021 Oral: Relative Positional Encoding for Transformers with Linear Complexity »
Antoine Liutkus · Ondřej Cífka · Shih-Lun Wu · Umut Simsekli · Yi-Hsuan Yang · Gaël RICHARD -
2020 Poster: Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise »
Umut Simsekli · Lingjiong Zhu · Yee-Whye Teh · Mert Gurbuzbalaban -
2019 Poster: A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks »
Umut Simsekli · Levent Sagun · Mert Gurbuzbalaban -
2019 Oral: A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks »
Umut Simsekli · Levent Sagun · Mert Gurbuzbalaban -
2019 Poster: Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances »
Bugra Can · Mert Gurbuzbalaban · Lingjiong Zhu -
2019 Oral: Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances »
Bugra Can · Mert Gurbuzbalaban · Lingjiong Zhu