Timezone: »
Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood. While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, they mainly relied on continuous-time approximations; and a rigorous treatment for the original discrete-time iterations is yet to be performed. To bridge this gap, we present novel bounds linking generalization to the \emph{lower tail exponent} of the transition kernel associated with the optimizer around a local minimum, in \emph{both} discrete- and continuous-time settings. To achieve this, we first prove a data- and algorithm-dependent generalization bound in terms of the celebrated Fernique--Talagrand functional applied to the trajectory of the optimizer. Then, we specialize this result by exploiting the Markovian structure of stochastic optimizers, and derive bounds in terms of their (data-dependent) transition kernels. We support our theory with empirical results from a variety of neural networks, showing correlations between generalization error and lower tail exponents.
Author Information
Liam Hodgkinson (University of California Berkeley)
Umut Simsekli (Inria/ENS)
Rajiv Khanna (UC Berkeley)
Michael Mahoney (UC Berkeley)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers »
Dates n/a. Room None
More from the Same Authors
-
2022 Poster: AutoIP: A United Framework to Integrate Physics into Gaussian Processes »
Da Long · Zheng Wang · Aditi Krishnapriyan · Robert Kirby · Shandian Zhe · Michael Mahoney -
2022 Spotlight: AutoIP: A United Framework to Integrate Physics into Gaussian Processes »
Da Long · Zheng Wang · Aditi Krishnapriyan · Robert Kirby · Shandian Zhe · Michael Mahoney -
2022 Poster: Fat–Tailed Variational Inference with Anisotropic Tail Adaptive Flows »
Feynman Liang · Michael Mahoney · Liam Hodgkinson -
2022 Spotlight: Fat–Tailed Variational Inference with Anisotropic Tail Adaptive Flows »
Feynman Liang · Michael Mahoney · Liam Hodgkinson -
2022 Poster: GACT: Activation Compressed Training for Generic Network Architectures »
Xiaoxuan Liu · Lianmin Zheng · Dequan Wang · Yukuo Cen · Weize Chen · Xu Han · Jianfei Chen · Zhiyuan Liu · Jie Tang · Joseph Gonzalez · Michael Mahoney · Alvin Cheung -
2022 Spotlight: GACT: Activation Compressed Training for Generic Network Architectures »
Xiaoxuan Liu · Lianmin Zheng · Dequan Wang · Yukuo Cen · Weize Chen · Xu Han · Jianfei Chen · Zhiyuan Liu · Jie Tang · Joseph Gonzalez · Michael Mahoney · Alvin Cheung -
2022 Poster: Neurotoxin: Durable Backdoors in Federated Learning »
Zhengming Zhang · Ashwinee Panda · Linyue Song · Yaoqing Yang · Michael Mahoney · Prateek Mittal · Kannan Ramchandran · Joseph E Gonzalez -
2022 Spotlight: Neurotoxin: Durable Backdoors in Federated Learning »
Zhengming Zhang · Ashwinee Panda · Linyue Song · Yaoqing Yang · Michael Mahoney · Prateek Mittal · Kannan Ramchandran · Joseph E Gonzalez -
2021 : Theory of feature selection »
Rajiv Khanna -
2021 Workshop: Beyond first-order methods in machine learning systems »
Albert S Berahas · Anastasios Kyrillidis · Fred Roosta · Amir Gholaminejad · Michael Mahoney · Rachael Tappenden · Raghu Bollapragada · Rixon Crane · J. Lyle Kim -
2021 Poster: HAWQ-V3: Dyadic Neural Network Quantization »
Zhewei Yao · Zhen Dong · Zhangcheng Zheng · Amir Gholaminejad · Jiali Yu · Eric Tan · Leyuan Wang · Qijing Huang · Yida Wang · Michael Mahoney · EECS Kurt Keutzer -
2021 Spotlight: HAWQ-V3: Dyadic Neural Network Quantization »
Zhewei Yao · Zhen Dong · Zhangcheng Zheng · Amir Gholaminejad · Jiali Yu · Eric Tan · Leyuan Wang · Qijing Huang · Yida Wang · Michael Mahoney · EECS Kurt Keutzer -
2021 Poster: Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections »
Alexander D Camuto · Xiaoyu Wang · Lingjiong Zhu · Christopher Holmes · Mert Gurbuzbalaban · Umut Simsekli -
2021 Spotlight: Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections »
Alexander D Camuto · Xiaoyu Wang · Lingjiong Zhu · Christopher Holmes · Mert Gurbuzbalaban · Umut Simsekli -
2021 Poster: The Heavy-Tail Phenomenon in SGD »
Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2021 Poster: ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training »
Jianfei Chen · Lianmin Zheng · Zhewei Yao · Dequan Wang · Ion Stoica · Michael Mahoney · Joseph E Gonzalez -
2021 Oral: ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training »
Jianfei Chen · Lianmin Zheng · Zhewei Yao · Dequan Wang · Ion Stoica · Michael Mahoney · Joseph E Gonzalez -
2021 Spotlight: The Heavy-Tail Phenomenon in SGD »
Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2021 Poster: Relative Positional Encoding for Transformers with Linear Complexity »
Antoine Liutkus · Ondřej Cífka · Shih-Lun Wu · Umut Simsekli · Yi-Hsuan Yang · Gaël RICHARD -
2021 Poster: Multiplicative Noise and Heavy Tails in Stochastic Optimization »
Liam Hodgkinson · Michael Mahoney -
2021 Spotlight: Multiplicative Noise and Heavy Tails in Stochastic Optimization »
Liam Hodgkinson · Michael Mahoney -
2021 Oral: Relative Positional Encoding for Transformers with Linear Complexity »
Antoine Liutkus · Ondřej Cífka · Shih-Lun Wu · Umut Simsekli · Yi-Hsuan Yang · Gaël RICHARD -
2020 : Determinantal Point Processes in Randomized Numerical Linear Algebra »
Michael Mahoney -
2020 Workshop: Beyond first order methods in machine learning systems »
Albert S Berahas · Amir Gholaminejad · Anastasios Kyrillidis · Michael Mahoney · Fred Roosta -
2020 Poster: Forecasting Sequential Data Using Consistent Koopman Autoencoders »
Omri Azencot · N. Benjamin Erichson · Vanessa Lin · Michael Mahoney -
2020 Poster: PowerNorm: Rethinking Batch Normalization in Transformers »
Sheng Shen · Zhewei Yao · Amir Gholaminejad · Michael Mahoney · Kurt Keutzer -
2020 Poster: Error Estimation for Sketched SVD via the Bootstrap »
Miles Lopes · N. Benjamin Erichson · Michael Mahoney -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 : Why Deep Learning Works: Traditional and Heavy-Tailed Implicit Self-Regularization in Deep Neural Networks »
Michael Mahoney -
2019 Poster: Traditional and Heavy Tailed Self Regularization in Neural Network Models »
Michael Mahoney · Charles H Martin -
2019 Oral: Traditional and Heavy Tailed Self Regularization in Neural Network Models »
Michael Mahoney · Charles H Martin -
2018 Poster: Out-of-sample extension of graph adjacency spectral embedding »
Keith Levin · Fred Roosta · Michael Mahoney · Carey Priebe -
2018 Oral: Out-of-sample extension of graph adjacency spectral embedding »
Keith Levin · Fred Roosta · Michael Mahoney · Carey Priebe -
2018 Poster: Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap »
Miles Lopes · Shusen Wang · Michael Mahoney -
2018 Oral: Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap »
Miles Lopes · Shusen Wang · Michael Mahoney -
2017 Poster: Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging »
Shusen Wang · Alex Gittens · Michael Mahoney -
2017 Poster: Capacity Releasing Diffusion for Speed and Locality. »
Di Wang · Kimon Fountoulakis · Monika Henzinger · Michael Mahoney · Satish Rao -
2017 Talk: Capacity Releasing Diffusion for Speed and Locality. »
Di Wang · Kimon Fountoulakis · Monika Henzinger · Michael Mahoney · Satish Rao -
2017 Talk: Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging »
Shusen Wang · Alex Gittens · Michael Mahoney