Timezone: »
Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well. An emerging paradigm for describing this unexpected behavior is in terms of a \emph{double descent} curve, in which increasing a model's capacity causes its test error to first decrease, then increase to a maximum near the interpolation threshold, and then decrease again in the overparameterized regime. Recent efforts to explain this phenomenon theoretically have focused on simple settings, such as linear regression or kernel regression with unstructured random features, which we argue are too coarse to reveal important nuances of actual neural networks. We provide a precise high-dimensional asymptotic analysis of generalization under kernel regression with the Neural Tangent Kernel, which characterizes the behavior of wide neural networks optimized with gradient descent. Our results reveal that the test error has nonmonotonic behavior deep in the overparameterized regime and can even exhibit additional peaks and descents when the number of parameters scales quadratically with the dataset size.
Author Information
Ben Adlam (Google Brain)
Jeffrey Pennington (Google Brain)
More from the Same Authors
-
2023 Poster: Underspecification Presents Challenges for Credibility in Modern Machine Learning »
Alexander D'Amour · Katherine Heller · Dan Moldovan · Ben Adlam · Babak Alipanahi · Alex Beutel · Christina Chen · Jonathan Deaton · Jacob Eisenstein · Matthew Hoffman · Farhad Hormozdiari · Neil Houlsby · Shaobo Hou · Ghassen Jerfel · Alan Karthikesalingam · Mario Lucic · Yian Ma · Cory McLean · Diana Mincu · Akinori Mitani · Andrea Montanari · Zachary Nado · Vivek Natarajan · Christopher Nielson · Thomas F. Osborne · Rajiv Raman · Kim Ramasamy · Rory sayres · Jessica Schrouff · Martin Seneviratne · Shannon Sequeira · Harini Suresh · Victor Veitch · Maksym Vladymyrov · Xuezhi Wang · Kellie Webster · Steve Yadlowsky · Taedong Yun · Xiaohua Zhai · D. Sculley -
2023 Poster: Second-order regression models exhibit progressive sharpening to the edge of stability »
Atish Agarwala · Fabian Pedregosa · Jeffrey Pennington -
2022 Poster: Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm »
Lechao Xiao · Jeffrey Pennington -
2022 Poster: Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling »
Jiri Hron · Roman Novak · Jeffrey Pennington · Jascha Sohl-Dickstein -
2022 Spotlight: Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm »
Lechao Xiao · Jeffrey Pennington -
2022 Spotlight: Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling »
Jiri Hron · Roman Novak · Jeffrey Pennington · Jascha Sohl-Dickstein -
2021 : The Mystery of Generalization: Why Does Deep Learning Work? »
Jeffrey Pennington -
2021 Tutorial: Random Matrix Theory and ML (RMT+ML) »
Fabian Pedregosa · Courtney Paquette · Thomas Trogdon · Jeffrey Pennington -
2020 Poster: Disentangling Trainability and Generalization in Deep Neural Networks »
Lechao Xiao · Jeffrey Pennington · Samuel Schoenholz -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 Workshop: Theoretical Physics for Deep Learning »
Jaehoon Lee · Jeffrey Pennington · Yasaman Bahri · Max Welling · Surya Ganguli · Joan Bruna -
2019 : Opening Remarks »
Jaehoon Lee · Jeffrey Pennington · Yasaman Bahri · Max Welling · Surya Ganguli · Joan Bruna -
2018 Poster: Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks »
Minmin Chen · Jeffrey Pennington · Samuel Schoenholz -
2018 Oral: Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks »
Minmin Chen · Jeffrey Pennington · Samuel Schoenholz -
2018 Poster: Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks »
Lechao Xiao · Yasaman Bahri · Jascha Sohl-Dickstein · Samuel Schoenholz · Jeffrey Pennington -
2018 Oral: Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks »
Lechao Xiao · Yasaman Bahri · Jascha Sohl-Dickstein · Samuel Schoenholz · Jeffrey Pennington -
2017 Poster: Geometry of Neural Network Loss Surfaces via Random Matrix Theory »
Jeffrey Pennington · Yasaman Bahri -
2017 Talk: Geometry of Neural Network Loss Surfaces via Random Matrix Theory »
Jeffrey Pennington · Yasaman Bahri