Timezone: »
Poster
Geometry of Neural Network Loss Surfaces via Random Matrix Theory
Jeffrey Pennington · Yasaman Bahri
Understanding the geometry of neural network loss surfaces is important for the development of improved optimization algorithms and for building a theoretical understanding of why deep learning works. In this paper, we study the geometry in terms of the distribution of eigenvalues of the Hessian matrix at critical points of varying energy. We introduce an analytical framework and a set of tools from random matrix theory that allow us to compute an approximation of this distribution under a set of simplifying assumptions. The shape of the spectrum depends strongly on the energy and another key parameter, $\phi$, which measures the ratio of parameters to data points. Our analysis predicts and numerical simulations support that for critical points of small index, the number of negative eigenvalues scales like the 3/2 power of the energy. We leave as an open problem an explanation for our observation that, in the context of a certain memorization task, the energy of minimizers is well-approximated by the function $1/2(1-\phi)^2$.
Author Information
Jeffrey Pennington (Google Brain)
Yasaman Bahri (Google Brain)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Talk: Geometry of Neural Network Loss Surfaces via Random Matrix Theory »
Mon. Aug 7th 01:24 -- 01:42 AM Room C4.8
More from the Same Authors
-
2023 Poster: Second-order regression models exhibit progressive sharpening to the edge of stability »
Atish Agarwala · Fabian Pedregosa · Jeffrey Pennington -
2022 Poster: Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm »
Lechao Xiao · Jeffrey Pennington -
2022 Poster: Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling »
Jiri Hron · Roman Novak · Jeffrey Pennington · Jascha Sohl-Dickstein -
2022 Spotlight: Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm »
Lechao Xiao · Jeffrey Pennington -
2022 Spotlight: Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling »
Jiri Hron · Roman Novak · Jeffrey Pennington · Jascha Sohl-Dickstein -
2021 Workshop: Over-parameterization: Pitfalls and Opportunities »
Yasaman Bahri · Quanquan Gu · Amin Karbasi · Hanie Sedghi -
2021 : The Mystery of Generalization: Why Does Deep Learning Work? »
Jeffrey Pennington -
2021 Tutorial: Random Matrix Theory and ML (RMT+ML) »
Fabian Pedregosa · Courtney Paquette · Thomas Trogdon · Jeffrey Pennington -
2020 Poster: The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization »
Ben Adlam · Jeffrey Pennington -
2020 Poster: Infinite attention: NNGP and NTK for deep attention networks »
Jiri Hron · Yasaman Bahri · Jascha Sohl-Dickstein · Roman Novak -
2020 Poster: Disentangling Trainability and Generalization in Deep Neural Networks »
Lechao Xiao · Jeffrey Pennington · Samuel Schoenholz -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 Workshop: Theoretical Physics for Deep Learning »
Jaehoon Lee · Jeffrey Pennington · Yasaman Bahri · Max Welling · Surya Ganguli · Joan Bruna -
2019 : Opening Remarks »
Jaehoon Lee · Jeffrey Pennington · Yasaman Bahri · Max Welling · Surya Ganguli · Joan Bruna -
2018 Poster: Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks »
Minmin Chen · Jeffrey Pennington · Samuel Schoenholz -
2018 Oral: Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks »
Minmin Chen · Jeffrey Pennington · Samuel Schoenholz -
2018 Poster: Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks »
Lechao Xiao · Yasaman Bahri · Jascha Sohl-Dickstein · Samuel Schoenholz · Jeffrey Pennington -
2018 Oral: Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks »
Lechao Xiao · Yasaman Bahri · Jascha Sohl-Dickstein · Samuel Schoenholz · Jeffrey Pennington