Timezone: »
Recent studies of gradient descent with large step sizes have shown that there is often a regime with an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization of the eigenvalue near the maximum value which allows convergence (edge of stability). These phenomena are intrinsically non-linear and do not happen for models in the constant Neural Tangent Kernel (NTK) regime, for which the predictive function is approximately linear in the parameters. As such, we consider the next simplest class of predictive models, namely those that are quadratic in the parameters, which we call second-order regression models. For quadratic objectives in two dimensions, we prove that this second-order regression model exhibits progressive sharpening of the NTK eigenvalue towards a value that differs slightly from the edge of stability, which we explicitly compute. In higher dimensions, the model generically shows similar behavior, even without the specific structure of a neural network, suggesting that progressive sharpening and edge-of-stability behavior aren't unique features of neural networks, and could be a more general property of discrete learning algorithms in high-dimensional non-linear models.
Author Information
Atish Agarwala (Google DeepMind)
Fabian Pedregosa (Google)
Jeffrey Pennington (Google Brain)
More from the Same Authors
-
2023 Poster: SAM operates far from home: eigenvalue regularization as a dynamical phenomenon »
Atish Agarwala · Yann Nicolas Dauphin -
2022 Poster: Deep equilibrium networks are sensitive to initialization statistics »
Atish Agarwala · Samuel Schoenholz -
2022 Poster: Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm »
Lechao Xiao · Jeffrey Pennington -
2022 Poster: Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling »
Jiri Hron · Roman Novak · Jeffrey Pennington · Jascha Sohl-Dickstein -
2022 Spotlight: Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm »
Lechao Xiao · Jeffrey Pennington -
2022 Spotlight: Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling »
Jiri Hron · Roman Novak · Jeffrey Pennington · Jascha Sohl-Dickstein -
2022 Spotlight: Deep equilibrium networks are sensitive to initialization statistics »
Atish Agarwala · Samuel Schoenholz -
2022 Poster: Only tails matter: Average-Case Universality and Robustness in the Convex Regime »
LEONARDO CUNHA · Gauthier Gidel · Fabian Pedregosa · Damien Scieur · Courtney Paquette -
2022 Poster: On Implicit Bias in Overparameterized Bilevel Optimization »
Paul Vicol · Jonathan Lorraine · Fabian Pedregosa · David Duvenaud · Roger Grosse -
2022 Spotlight: On Implicit Bias in Overparameterized Bilevel Optimization »
Paul Vicol · Jonathan Lorraine · Fabian Pedregosa · David Duvenaud · Roger Grosse -
2022 Spotlight: Only tails matter: Average-Case Universality and Robustness in the Convex Regime »
LEONARDO CUNHA · Gauthier Gidel · Fabian Pedregosa · Damien Scieur · Courtney Paquette -
2021 : The Mystery of Generalization: Why Does Deep Learning Work? »
Jeffrey Pennington -
2021 : Introduction »
Fabian Pedregosa · Courtney Paquette -
2021 Tutorial: Random Matrix Theory and ML (RMT+ML) »
Fabian Pedregosa · Courtney Paquette · Thomas Trogdon · Jeffrey Pennington -
2020 Poster: Acceleration through spectral density estimation »
Fabian Pedregosa · Damien Scieur -
2020 Poster: Universal Asymptotic Optimality of Polyak Momentum »
Damien Scieur · Fabian Pedregosa -
2020 Poster: The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization »
Ben Adlam · Jeffrey Pennington -
2020 Poster: Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization »
Geoffrey Negiar · Gideon Dresdner · Alicia Yi-Ting Tsai · Laurent El Ghaoui · Francesco Locatello · Robert Freund · Fabian Pedregosa -
2020 Poster: Disentangling Trainability and Generalization in Deep Neural Networks »
Lechao Xiao · Jeffrey Pennington · Samuel Schoenholz -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 Workshop: Theoretical Physics for Deep Learning »
Jaehoon Lee · Jeffrey Pennington · Yasaman Bahri · Max Welling · Surya Ganguli · Joan Bruna -
2019 : Opening Remarks »
Jaehoon Lee · Jeffrey Pennington · Yasaman Bahri · Max Welling · Surya Ganguli · Joan Bruna -
2018 Poster: Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks »
Minmin Chen · Jeffrey Pennington · Samuel Schoenholz -
2018 Oral: Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks »
Minmin Chen · Jeffrey Pennington · Samuel Schoenholz -
2018 Poster: Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks »
Lechao Xiao · Yasaman Bahri · Jascha Sohl-Dickstein · Samuel Schoenholz · Jeffrey Pennington -
2018 Oral: Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks »
Lechao Xiao · Yasaman Bahri · Jascha Sohl-Dickstein · Samuel Schoenholz · Jeffrey Pennington -
2017 Poster: Geometry of Neural Network Loss Surfaces via Random Matrix Theory »
Jeffrey Pennington · Yasaman Bahri -
2017 Talk: Geometry of Neural Network Loss Surfaces via Random Matrix Theory »
Jeffrey Pennington · Yasaman Bahri