Timezone: »
Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty
Thomas George · Guillaume Lajoie · Aristide Baratin
Event URL: https://openreview.net/forum?id=2F8ktRFqvnM »
A recent line of work has identified a so-called ‘lazy regime’ where a deep network can be well approximated by its linearization around initialization throughout training. Here we investigate the comparative effect of the lazy (linear) and featurelearning (non-linear) regimes on subgroups of examples based on their difficulty. Specifically, we show that easier examples are given more weight in feature learning mode, resulting in faster training compared to more difficult ones. We illustrate this phenomenon across different ways to quantify example difficulty, including c-score, label noise, and in the presence of spurious correlations.
Author Information
Thomas George (Mila - Université de Montréal)
Guillaume Lajoie (Mila, Université de Montréal)
Aristide Baratin (MILA)
More from the Same Authors
-
2021 : Gradient Starvation: A Learning Proclivity in Neural Networks »
Mohammad Pezeshki · Sékou-Oumar Kaba · Yoshua Bengio · Aaron Courville · Doina Precup · Guillaume Lajoie -
2021 : Epoch-Wise Double Descent: A Theory of Multi-scale Feature Learning Dynamics »
Mohammad Pezeshki · Amartya Mitra · Yoshua Bengio · Guillaume Lajoie -
2023 Poster: Flexible Phase Dynamics for Bio-plausible Contrastive Learning »
Ezekiel Williams · Colin Bredenberg · Guillaume Lajoie -
2023 Poster: CrossSplit: Mitigating Label Noise Memorization through Data Splitting »
Jihye Kim · Aristide Baratin · Yan Zhang · Simon Lacoste-Julien -
2022 : Is a Modular Architecture Enough? »
Sarthak Mittal · Yoshua Bengio · Guillaume Lajoie -
2022 Poster: Multi-scale Feature Learning Dynamics: Insights for Double Descent »
Mohammad Pezeshki · Amartya Mitra · Yoshua Bengio · Guillaume Lajoie -
2022 Spotlight: Multi-scale Feature Learning Dynamics: Insights for Double Descent »
Mohammad Pezeshki · Amartya Mitra · Yoshua Bengio · Guillaume Lajoie -
2020 Poster: Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules »
Sarthak Mittal · Alex Lamb · Anirudh Goyal · Vikram Voleti · Murray Shanahan · Guillaume Lajoie · Michael Mozer · Yoshua Bengio -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 Poster: On the Spectral Bias of Neural Networks »
Nasim Rahaman · Aristide Baratin · Devansh Arpit · Felix Draxler · Min Lin · Fred Hamprecht · Yoshua Bengio · Aaron Courville -
2019 Oral: On the Spectral Bias of Neural Networks »
Nasim Rahaman · Aristide Baratin · Devansh Arpit · Felix Draxler · Min Lin · Fred Hamprecht · Yoshua Bengio · Aaron Courville