Timezone: »
We identify and formalize a fundamental gradient descent phenomenon leading to a learning proclivity in over-parameterized neural networks. Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task, despite the presence of other predictive features that fail to be discovered. This work provides a theoretical explanation for the emergence of such feature imbalances in neural networks. Using tools from Dynamical Systems theory, we identify simple properties of learning dynamics during gradient descent that lead to this imbalance, and prove that such a situation can be expected given certain statistical structure in training data. Based on our proposed formalism, we develop guarantees for a novel but simple regularization method aimed at decoupling feature learning dynamics, improving accuracy and robustness in cases hindered by gradient starvation. We illustrate our findings with simple and real-world out-of-distribution (OOD) generalization experiments.
Author Information
Mohammad Pezeshki (Mila, Université de Montréal)
Sékou-Oumar Kaba (Mila, McGill University)
Yoshua Bengio (Mila - Quebec AI Institute)
Aaron Courville (University of Montreal)
Doina Precup (DeepMind)
Guillaume Lajoie (Mila, Université de Montréal)
More from the Same Authors
-
2021 : Epoch-Wise Double Descent: A Theory of Multi-scale Feature Learning Dynamics »
Mohammad Pezeshki · Amartya Mitra · Yoshua Bengio · Guillaume Lajoie -
2021 : Exploration-Driven Representation Learning in Reinforcement Learning »
Akram Erraqabi · Mingde Zhao · Marlos C. Machado · Yoshua Bengio · Sainbayar Sukhbaatar · Ludovic Denoyer · Alessandro Lazaric -
2021 : Variational Causal Networks: Approximate Bayesian Inference over Causal Structures »
Yashas Annadani · Jonas Rothfuss · Alexandre Lacoste · Nino Scherrer · Anirudh Goyal · Yoshua Bengio · Stefan Bauer -
2022 : On the Generalization and Adaption Performance of Causal Models »
Nino Scherrer · Anirudh Goyal · Stefan Bauer · Yoshua Bengio · Rosemary Nan Ke -
2022 : Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty »
Thomas George · Guillaume Lajoie · Aristide Baratin -
2022 : MAgNet: Mesh Agnostic Neural PDE Solver »
Oussama Boussif · Yoshua Bengio · Loubna Benabbou · Dan Assouline -
2022 : Unsupervised Model-based Pre-training for Data-efficient Reinforcement Learning from Pixels »
Sai Rajeswar · Pietro Mazzaglia · Tim Verbelen · Alex Piche · Bart Dhoedt · Aaron Courville · Alexandre Lacoste -
2022 Workshop: Hardware-aware efficient training (HAET) »
Gonçalo Mordido · Yoshua Bengio · Ghouthi BOUKLI HACENE · Vincent Gripon · François Leduc-Primeau · Vahid Partovi Nia · Julie Grollier -
2022 : Is a Modular Architecture Enough? »
Sarthak Mittal · Yoshua Bengio · Guillaume Lajoie -
2022 Poster: Building Robust Ensembles via Margin Boosting »
Dinghuai Zhang · Hongyang Zhang · Aaron Courville · Yoshua Bengio · Pradeep Ravikumar · Arun Sai Suggala -
2022 Poster: Multi-scale Feature Learning Dynamics: Insights for Double Descent »
Mohammad Pezeshki · Amartya Mitra · Yoshua Bengio · Guillaume Lajoie -
2022 Poster: Proving Theorems using Incremental Learning and Hindsight Experience Replay »
Eser Aygün · Ankit Anand · Laurent Orseau · Xavier Glorot · Stephen McAleer · Vlad Firoiu · Lei Zhang · Doina Precup · Shibl Mourad -
2022 Spotlight: Proving Theorems using Incremental Learning and Hindsight Experience Replay »
Eser Aygün · Ankit Anand · Laurent Orseau · Xavier Glorot · Stephen McAleer · Vlad Firoiu · Lei Zhang · Doina Precup · Shibl Mourad -
2022 Spotlight: Building Robust Ensembles via Margin Boosting »
Dinghuai Zhang · Hongyang Zhang · Aaron Courville · Yoshua Bengio · Pradeep Ravikumar · Arun Sai Suggala -
2022 Spotlight: Multi-scale Feature Learning Dynamics: Insights for Double Descent »
Mohammad Pezeshki · Amartya Mitra · Yoshua Bengio · Guillaume Lajoie -
2022 Poster: Biological Sequence Design with GFlowNets »
Moksh Jain · Emmanuel Bengio · Alex Hernandez-Garcia · Jarrid Rector-Brooks · Bonaventure Dossou · Chanakya Ekbote · Jie Fu · Tianyu Zhang · Michael Kilgour · Dinghuai Zhang · Lena Simine · Payel Das · Yoshua Bengio -
2022 Spotlight: Biological Sequence Design with GFlowNets »
Moksh Jain · Emmanuel Bengio · Alex Hernandez-Garcia · Jarrid Rector-Brooks · Bonaventure Dossou · Chanakya Ekbote · Jie Fu · Tianyu Zhang · Michael Kilgour · Dinghuai Zhang · Lena Simine · Payel Das · Yoshua Bengio -
2022 Poster: Generative Flow Networks for Discrete Probabilistic Modeling »
Dinghuai Zhang · Nikolay Malkin · Zhen Liu · Alexandra Volokhova · Aaron Courville · Yoshua Bengio -
2022 Poster: Towards Scaling Difference Target Propagation by Learning Backprop Targets »
Maxence ERNOULT · Fabrice Normandin · Abhinav Moudgil · Sean Spinney · Eugene Belilovsky · Irina Rish · Blake Richards · Yoshua Bengio -
2022 Spotlight: Towards Scaling Difference Target Propagation by Learning Backprop Targets »
Maxence ERNOULT · Fabrice Normandin · Abhinav Moudgil · Sean Spinney · Eugene Belilovsky · Irina Rish · Blake Richards · Yoshua Bengio -
2022 Spotlight: Generative Flow Networks for Discrete Probabilistic Modeling »
Dinghuai Zhang · Nikolay Malkin · Zhen Liu · Alexandra Volokhova · Aaron Courville · Yoshua Bengio -
2021 Workshop: Tackling Climate Change with Machine Learning »
Hari Prasanna Das · Katarzyna Tokarska · Maria João Sousa · Meareg Hailemariam · David Rolnick · Xiaoxiang Zhu · Yoshua Bengio -
2021 Poster: An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming »
Minkai Xu · Wujie Wang · Shitong Luo · Chence Shi · Yoshua Bengio · Rafael Gomez-Bombarelli · Jian Tang -
2021 Spotlight: An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming »
Minkai Xu · Wujie Wang · Shitong Luo · Chence Shi · Yoshua Bengio · Rafael Gomez-Bombarelli · Jian Tang -
2020 : QA for invited talk 4 Bengio »
Yoshua Bengio -
2020 : Invited talk 4 Bengio »
Yoshua Bengio -
2020 : Keynote: Yoshua Bengio (Q&A) »
Yoshua Bengio -
2020 : Keynote: Yoshua Bengio »
Yoshua Bengio -
2020 Workshop: Object-Oriented Learning: Perception, Representation, and Reasoning »
Sungjin Ahn · Adam Kosiorek · Jessica Hamrick · Sjoerd van Steenkiste · Yoshua Bengio -
2020 Workshop: MLRetrospectives: A Venue for Self-Reflection in ML Research »
Jessica Forde · Jesse Dodge · Mayoore Jaiswal · Rosanne Liu · Ryan Lowe · Rosanne Liu · Joelle Pineau · Yoshua Bengio -
2020 Poster: Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules »
Sarthak Mittal · Alex Lamb · Anirudh Goyal · Vikram Voleti · Murray Shanahan · Guillaume Lajoie · Michael Mozer · Yoshua Bengio -
2020 Poster: Learning to Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning »
Sai Krishna Gottipati · Boris Sattarov · Sufeng Niu · Yashaswi Pathak · Haoran Wei · Shengchao Liu · Shengchao Liu · Simon Blackburn · Karam Thomas · Connor Coley · Jian Tang · Sarath Chandar · Yoshua Bengio -
2020 Poster: Perceptual Generative Autoencoders »
Zijun Zhang · Ruixiang ZHANG · Zongpeng Li · Yoshua Bengio · Liam Paull -
2020 Poster: Revisiting Fundamentals of Experience Replay »
William Fedus · Prajit Ramachandran · Rishabh Agarwal · Yoshua Bengio · Hugo Larochelle · Mark Rowland · Will Dabney -
2020 Poster: Small-GAN: Speeding up GAN Training using Core-Sets »
Samrath Sinha · Han Zhang · Anirudh Goyal · Yoshua Bengio · Hugo Larochelle · Augustus Odena -
2020 Poster: What can I do here? A Theory of Affordances in Reinforcement Learning »
Khimya Khetarpal · Zafarali Ahmed · Gheorghe Comanici · David Abel · Doina Precup -
2019 : AI Commons »
Yoshua Bengio -
2019 : Opening remarks »
Yoshua Bengio -
2019 Workshop: AI For Social Good (AISG) »
Margaux Luck · Kris Sankaran · Tristan Sylvain · Sean McGregor · Jonnie Penn · Girmaw Abebe Tadesse · Virgile Sylvain · Myriam Côté · Lester Mackey · Rayid Ghani · Yoshua Bengio -
2019 : Panel Discussion »
Yoshua Bengio · Andrew Ng · Raia Hadsell · John Platt · Claire Monteleoni · Jennifer Chayes -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 : Personalized Visualization of the Impact of Climate Change »
Yoshua Bengio -
2019 : Networking Lunch (provided) + Poster Session »
Abraham Stanway · Alex Robson · Aneesh Rangnekar · Ashesh Chattopadhyay · Ashley Pilipiszyn · Benjamin LeRoy · Bolong Cheng · Ce Zhang · Chaopeng Shen · Christian Schroeder · Christian Clough · Clement DUHART · Clement Fung · Cozmin Ududec · Dali Wang · David Dao · di wu · Dimitrios Giannakis · Dino Sejdinovic · Doina Precup · Duncan Watson-Parris · Gege Wen · George Chen · Gopal Erinjippurath · Haifeng Li · Han Zou · Herke van Hoof · Hillary A Scannell · Hiroshi Mamitsuka · Hongbao Zhang · Jaegul Choo · James Wang · James Requeima · Jessica Hwang · Jinfan Xu · Johan Mathe · Jonathan Binas · Joonseok Lee · Kalai Ramea · Kate Duffy · Kevin McCloskey · Kris Sankaran · Lester Mackey · Letif Mones · Loubna Benabbou · Lynn Kaack · Matthew Hoffman · Mayur Mudigonda · Mehrdad Mahdavi · Michael McCourt · Mingchao Jiang · Mohammad Mahdi Kamani · Neel Guha · Niccolo Dalmasso · Nick Pawlowski · Nikola Milojevic-Dupont · Paulo Orenstein · Pedram Hassanzadeh · Pekka Marttinen · Ramesh Nair · Sadegh Farhang · Samuel Kaski · Sandeep Manjanna · Sasha Luccioni · Shuby Deshpande · Soo Kim · Soukayna Mouatadid · Sunghyun Park · Tao Lin · Telmo Felgueira · Thomas Hornigold · Tianle Yuan · Tom Beucler · Tracy Cui · Volodymyr Kuleshov · Wei Yu · yang song · Ydo Wexler · Yoshua Bengio · Zhecheng Wang · Zhuangfang Yi · Zouheir Malki -
2019 Workshop: Climate Change: How Can AI Help? »
David Rolnick · Alexandre Lacoste · Tegan Maharaj · Jennifer Chayes · Yoshua Bengio -
2019 Poster: State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations »
Alex Lamb · Jonathan Binas · Anirudh Goyal · Sandeep Subramanian · Ioannis Mitliagkas · Yoshua Bengio · Michael Mozer -
2019 Poster: On the Spectral Bias of Neural Networks »
Nasim Rahaman · Aristide Baratin · Devansh Arpit · Felix Draxler · Min Lin · Fred Hamprecht · Yoshua Bengio · Aaron Courville -
2019 Oral: On the Spectral Bias of Neural Networks »
Nasim Rahaman · Aristide Baratin · Devansh Arpit · Felix Draxler · Min Lin · Fred Hamprecht · Yoshua Bengio · Aaron Courville -
2019 Oral: State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations »
Alex Lamb · Jonathan Binas · Anirudh Goyal · Sandeep Subramanian · Ioannis Mitliagkas · Yoshua Bengio · Michael Mozer -
2019 Poster: Manifold Mixup: Better Representations by Interpolating Hidden States »
Vikas Verma · Alex Lamb · Christopher Beckham · Amir Najafi · Ioannis Mitliagkas · David Lopez-Paz · Yoshua Bengio -
2019 Poster: Per-Decision Option Discounting »
Anna Harutyunyan · Peter Vrancx · Philippe Hamel · Ann Nowe · Doina Precup -
2019 Poster: GMNN: Graph Markov Neural Networks »
Meng Qu · Yoshua Bengio · Jian Tang -
2019 Oral: Per-Decision Option Discounting »
Anna Harutyunyan · Peter Vrancx · Philippe Hamel · Ann Nowe · Doina Precup -
2019 Oral: GMNN: Graph Markov Neural Networks »
Meng Qu · Yoshua Bengio · Jian Tang -
2019 Oral: Manifold Mixup: Better Representations by Interpolating Hidden States »
Vikas Verma · Alex Lamb · Christopher Beckham · Amir Najafi · Ioannis Mitliagkas · David Lopez-Paz · Yoshua Bengio -
2018 Poster: Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data »
Amjad Almahairi · Sai Rajeswar · Alessandro Sordoni · Philip Bachman · Aaron Courville -
2018 Poster: Mutual Information Neural Estimation »
Mohamed Belghazi · Aristide Baratin · Sai Rajeswar · Sherjil Ozair · Yoshua Bengio · R Devon Hjelm · Aaron Courville -
2018 Oral: Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data »
Amjad Almahairi · Sai Rajeswar · Alessandro Sordoni · Philip Bachman · Aaron Courville -
2018 Oral: Mutual Information Neural Estimation »
Mohamed Belghazi · Aristide Baratin · Sai Rajeswar · Sherjil Ozair · Yoshua Bengio · R Devon Hjelm · Aaron Courville -
2018 Poster: Focused Hierarchical RNNs for Conditional Sequence Processing »
Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal -
2018 Oral: Focused Hierarchical RNNs for Conditional Sequence Processing »
Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal -
2017 Workshop: Reproducibility in Machine Learning Research »
Rosemary Nan Ke · Anirudh Goyal · Alex Lamb · Joelle Pineau · Samy Bengio · Yoshua Bengio -
2017 Poster: Sharp Minima Can Generalize For Deep Nets »
Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio -
2017 Poster: A Closer Look at Memorization in Deep Networks »
David Krueger · Yoshua Bengio · Stanislaw Jastrzebski · Maxinder S. Kanwal · Nicolas Ballas · Asja Fischer · Emmanuel Bengio · Devansh Arpit · Tegan Maharaj · Aaron Courville · Simon Lacoste-Julien -
2017 Talk: A Closer Look at Memorization in Deep Networks »
David Krueger · Yoshua Bengio · Stanislaw Jastrzebski · Maxinder S. Kanwal · Nicolas Ballas · Asja Fischer · Emmanuel Bengio · Devansh Arpit · Tegan Maharaj · Aaron Courville · Simon Lacoste-Julien -
2017 Talk: Sharp Minima Can Generalize For Deep Nets »
Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio