Timezone: »
We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are the complexity of the loss-landscape and of the dynamics within it, and to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and data-sets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, thus showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.
Author Information
Marco Baity-Jesi (Columbia University)
Levent Sagun (ENS/CEA)
Mario Geiger (EPFL)
Stefano Spigler (EPFL)
Gerard Arous
Chiara Cammarota (King's College London)
Sep 2015 - present Lecturer in the Mathematics Department, King's College London Apr 2013 - Aug 2015 Researcher in the Physics Department, Sapienza University of Rome Nov 2009 - Mar 2013 Post-Doc in the Institut de Physique Theorique, CEA, Saclay Nov 2006 - Oct 2009 PhD student in the Physics Department, Sapienza University of Rome Nov 2004 - Oct 2006 Master Degree in Theoretical Phisics in the Physics Department, Sapienza University of Rome Sep 2001 - Nov 2004 Undergraduate studies in the Physics Department, Sapienza University of Rome
Yann LeCun (New York University)
Matthieu Wyart
Giulio Biroli
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Comparing Dynamics: Deep Neural Networks versus Glassy Systems »
Wed. Jul 11th 04:15 -- 07:00 PM Room Hall B #168
More from the Same Authors
-
2021 : On the interplay between data structure and loss function: an analytical study of generalization for classification »
Stéphane d'Ascoli · Marylou Gabrié · Levent Sagun · Giulio Biroli -
2022 : Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Prior »
Ravid Shwartz-Ziv · Micah Goldblum · Hossein Souri · Sanyam Kapoor · Chen Zhu · Yann LeCun · Andrew Wilson -
2022 : What Do We Maximize In Self-Supervised Learning? »
Ravid Shwartz-Ziv · Ravid Shwartz-Ziv · Randall Balestriero · Yann LeCun · Yann LeCun -
2023 Poster: RankMe: Assessing the Downstream Performance of Pretrained Self-Supervised Representations by Their Rank »
Quentin Garrido · Randall Balestriero · Laurent Najman · Yann LeCun -
2023 Poster: The SSL Interplay: Augmentations, Inductive Bias, and Generalization »
Vivien Cabannnes · Bobak T Kiani · Randall Balestriero · Yann LeCun · Alberto Bietti -
2023 Poster: What Can Be Learnt With Wide Convolutional Neural Networks? »
Francesco Cagnetta · Alessandro Favero · Matthieu Wyart -
2023 Oral: RankMe: Assessing the Downstream Performance of Pretrained Self-Supervised Representations by Their Rank »
Quentin Garrido · Randall Balestriero · Laurent Najman · Yann LeCun -
2023 Poster: Self-supervised learning of Split Invariant Equivariant representations »
Quentin Garrido · Laurent Najman · Yann LeCun -
2023 Poster: A Generalization of ViT/MLP-Mixer to Graphs »
Xiaoxin He · Bryan Hooi · Thomas Laurent · Adam Perold · Yann LeCun · Xavier Bresson -
2023 Poster: Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning »
Antonio Sclocchi · Mario Geiger · Matthieu Wyart -
2022 : Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Prior »
Ravid Shwartz-Ziv · Micah Goldblum · Hossein Souri · Sanyam Kapoor · Chen Zhu · Yann LeCun · Andrew Wilson -
2022 Poster: Failure and success of the spectral bias prediction for Laplace Kernel Ridge Regression: the case of low-dimensional data »
Umberto M. Tomasini · Antonio Sclocchi · Matthieu Wyart -
2022 Spotlight: Failure and success of the spectral bias prediction for Laplace Kernel Ridge Regression: the case of low-dimensional data »
Umberto M. Tomasini · Antonio Sclocchi · Matthieu Wyart -
2018 Poster: Adversarially Regularized Autoencoders »
Jake Zhao · Yoon Kim · Kelly Zhang · Alexander Rush · Yann LeCun -
2018 Oral: Adversarially Regularized Autoencoders »
Jake Zhao · Yoon Kim · Kelly Zhang · Alexander Rush · Yann LeCun -
2017 Poster: Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs »
Li Jing · Yichen Shen · Tena Dubcek · John E Peurifoy · Scott Skirlo · Yann LeCun · Max Tegmark · Marin Soljačić -
2017 Talk: Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs »
Li Jing · Yichen Shen · Tena Dubcek · John E Peurifoy · Scott Skirlo · Yann LeCun · Max Tegmark · Marin Soljačić