Timezone: »
A longstanding goal in the theory of deep learning is to characterize the conditions under which a given neural network architecture will be trainable, and if so, how well it might generalize to unseen data. In this work, we provide such a characterization in the limit of very wide and very deep networks, for which the analysis simplifies considerably. For wide networks, the trajectory under gradient descent is governed by the Neural Tangent Kernel (NTK), and for deep networks the NTK itself maintains only weak data dependence. By analyzing the spectrum of the NTK, we formulate necessary conditions for trainability and generalization across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs). We identify large regions of hyperparameter space for which networks can memorize the training set but completely fail to generalize. We find that CNNs without global average pooling behave almost identically to FCNs, but that CNNs with pooling have markedly different and often better generalization performance. These theoretical results are corroborated experimentally on CIFAR10 for a variety of network architectures.
We include a \href{https://colab.research.google.com/github/google/neural-tangents/blob/master/notebooks/disentanglingtrainabilityand_generalization.ipynb}{colab} notebook that reproduces the essential results of the paper.
Author Information
Lechao Xiao (Google Research)
Jeffrey Pennington (Google Brain)
Samuel Schoenholz (Google Brain)
More from the Same Authors
-
2023 Poster: Second-order regression models exhibit progressive sharpening to the edge of stability »
Atish Agarwala · Fabian Pedregosa · Jeffrey Pennington -
2022 Poster: Fast Finite Width Neural Tangent Kernel »
Roman Novak · Jascha Sohl-Dickstein · Samuel Schoenholz -
2022 Spotlight: Fast Finite Width Neural Tangent Kernel »
Roman Novak · Jascha Sohl-Dickstein · Samuel Schoenholz -
2022 Poster: Deep equilibrium networks are sensitive to initialization statistics »
Atish Agarwala · Samuel Schoenholz -
2022 Poster: Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm »
Lechao Xiao · Jeffrey Pennington -
2022 Poster: Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling »
Jiri Hron · Roman Novak · Jeffrey Pennington · Jascha Sohl-Dickstein -
2022 Spotlight: Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm »
Lechao Xiao · Jeffrey Pennington -
2022 Spotlight: Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling »
Jiri Hron · Roman Novak · Jeffrey Pennington · Jascha Sohl-Dickstein -
2022 Spotlight: Deep equilibrium networks are sensitive to initialization statistics »
Atish Agarwala · Samuel Schoenholz -
2021 Poster: Learn2Hop: Learned Optimization on Rough Landscapes »
Amil Merchant · Luke Metz · Samuel Schoenholz · Ekin Dogus Cubuk -
2021 Spotlight: Learn2Hop: Learned Optimization on Rough Landscapes »
Amil Merchant · Luke Metz · Samuel Schoenholz · Ekin Dogus Cubuk -
2021 Poster: Tilting the playing field: Dynamical loss functions for machine learning »
Miguel Ruiz Garcia · Ge Zhang · Samuel Schoenholz · Andrea Liu -
2021 Oral: Tilting the playing field: Dynamical loss functions for machine learning »
Miguel Ruiz Garcia · Ge Zhang · Samuel Schoenholz · Andrea Liu -
2021 Poster: Whitening and Second Order Optimization Both Make Information in the Dataset Unusable During Training, and Can Reduce or Prevent Generalization »
Neha Wadia · Daniel Duckworth · Samuel Schoenholz · Ethan Dyer · Jascha Sohl-Dickstein -
2021 Spotlight: Whitening and Second Order Optimization Both Make Information in the Dataset Unusable During Training, and Can Reduce or Prevent Generalization »
Neha Wadia · Daniel Duckworth · Samuel Schoenholz · Ethan Dyer · Jascha Sohl-Dickstein -
2021 : The Mystery of Generalization: Why Does Deep Learning Work? »
Jeffrey Pennington -
2021 Tutorial: Random Matrix Theory and ML (RMT+ML) »
Fabian Pedregosa · Courtney Paquette · Thomas Trogdon · Jeffrey Pennington -
2020 Poster: The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization »
Ben Adlam · Jeffrey Pennington -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 Workshop: Theoretical Physics for Deep Learning »
Jaehoon Lee · Jeffrey Pennington · Yasaman Bahri · Max Welling · Surya Ganguli · Joan Bruna -
2019 : Opening Remarks »
Jaehoon Lee · Jeffrey Pennington · Yasaman Bahri · Max Welling · Surya Ganguli · Joan Bruna -
2018 Poster: Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks »
Minmin Chen · Jeffrey Pennington · Samuel Schoenholz -
2018 Oral: Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks »
Minmin Chen · Jeffrey Pennington · Samuel Schoenholz -
2018 Poster: Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks »
Lechao Xiao · Yasaman Bahri · Jascha Sohl-Dickstein · Samuel Schoenholz · Jeffrey Pennington -
2018 Oral: Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks »
Lechao Xiao · Yasaman Bahri · Jascha Sohl-Dickstein · Samuel Schoenholz · Jeffrey Pennington -
2017 Poster: Neural Message Passing for Quantum Chemistry »
Justin Gilmer · Samuel Schoenholz · Patrick F Riley · Oriol Vinyals · George Dahl -
2017 Talk: Neural Message Passing for Quantum Chemistry »
Justin Gilmer · Samuel Schoenholz · Patrick F Riley · Oriol Vinyals · George Dahl -
2017 Poster: Geometry of Neural Network Loss Surfaces via Random Matrix Theory »
Jeffrey Pennington · Yasaman Bahri -
2017 Talk: Geometry of Neural Network Loss Surfaces via Random Matrix Theory »
Jeffrey Pennington · Yasaman Bahri