Timezone: »
Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet. Empirical and theoretical results clearly indicate that the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of regularization, such as Dropout or Weight Norm constraints. Building on recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, we develop a theory to identify \emph{5+1 Phases of Training}, corresponding to increasing amounts of \emph{Implicit Self-Regularization}. For smaller and/or older DNNs, this Implicit Self-Regularization is like traditional Tikhonov regularization, in that there is a ``size scale'' separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of \emph{Heavy-Tailed Self-Regularization}, similar to the self-organization seen in the statistical physics of disordered systems. This implicit Self-Regularization can depend strongly on the many knobs of the training process. By exploiting the generalization gap phenomena, we demonstrate that we can cause a small model to exhibit all 5+1 phases of training simply by changing the batch size.
Author Information
Michael Mahoney (UC Berkeley)
Charles H Martin (Calculation Consulting)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Traditional and Heavy Tailed Self Regularization in Neural Network Models »
Wed Jun 12th 07:15 -- 07:20 PM Room Room 104
More from the Same Authors
-
2020 Workshop: Beyond first order methods in machine learning systems »
Albert S Berahas · Amir Gholaminejad · Anastasios Kyrillidis · Michael Mahoney · Fred Roosta -
2020 Poster: Forecasting Sequential Data Using Consistent Koopman Autoencoders »
Omri Azencot · N. Benjamin Erichson · Vanessa Lin · Michael Mahoney -
2020 Poster: PowerNorm: Rethinking Batch Normalization in Transformers »
Sheng Shen · Zhewei Yao · Amir Gholaminejad · Michael Mahoney · Kurt Keutzer -
2020 Poster: Error Estimation for Sketched SVD via the Bootstrap »
Miles Lopes · N. Benjamin Erichson · Michael Mahoney -
2018 Poster: Out-of-sample extension of graph adjacency spectral embedding »
Keith Levin · Fred Roosta · Michael Mahoney · Carey Priebe -
2018 Oral: Out-of-sample extension of graph adjacency spectral embedding »
Keith Levin · Fred Roosta · Michael Mahoney · Carey Priebe -
2018 Poster: Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap »
Miles Lopes · Shusen Wang · Michael Mahoney -
2018 Oral: Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap »
Miles Lopes · Shusen Wang · Michael Mahoney -
2017 Poster: Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging »
Shusen Wang · Alex Gittens · Michael Mahoney -
2017 Poster: Capacity Releasing Diffusion for Speed and Locality. »
Di Wang · Kimon Fountoulakis · Monika Henzinger · Michael Mahoney · Satish Rao -
2017 Talk: Capacity Releasing Diffusion for Speed and Locality. »
Di Wang · Kimon Fountoulakis · Monika Henzinger · Michael Mahoney · Satish Rao -
2017 Talk: Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging »
Shusen Wang · Alex Gittens · Michael Mahoney