Timezone: »
Our theoretical understanding of deep learning has not kept pace with its empirical success. While network architecture is known to be critical, we do not yet understand its effect on learned representations and network behavior, or how this architecture should reflect task structure.In this work, we begin to address this gap by introducing the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics within an architecture. Crucially, because of the gating, these networks can compute nonlinear functions of their input. We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning. Our analysis demonstrates that the learning dynamics in structured networks can be conceptualized as a neural race with an implicit bias towards shared representations, which then govern the model's ability to systematically generalize, multi-task, and transfer. We validate our key insights on naturalistic datasets and with relaxed assumptions. Taken together, our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures and the role of modularity and compositionality in solving real-world problems. The code and results are available at https://www.saxelab.org/gated-dln.
Author Information
Andrew Saxe (UCL)
Shagun Sodhani (Facebook AI Research)
Sam Lewallen (University College London)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: The Neural Race Reduction: Dynamics of Abstraction in Gated Networks »
Wed. Jul 20th 09:25 -- 09:30 PM Room Ballroom 3 & 4
More from the Same Authors
-
2022 Poster: Robust Policy Learning over Multiple Uncertainty Sets »
Annie Xie · Shagun Sodhani · Chelsea Finn · Joelle Pineau · Amy Zhang -
2022 Spotlight: Robust Policy Learning over Multiple Uncertainty Sets »
Annie Xie · Shagun Sodhani · Chelsea Finn · Joelle Pineau · Amy Zhang -
2022 Poster: Maslow's Hammer in Catastrophic Forgetting: Node Re-Use vs. Node Activation »
Sebastian Lee · Stefano Sarao Mannelli · Claudia Clopath · Sebastian Goldt · Andrew Saxe -
2022 Spotlight: Maslow's Hammer in Catastrophic Forgetting: Node Re-Use vs. Node Activation »
Sebastian Lee · Stefano Sarao Mannelli · Claudia Clopath · Sebastian Goldt · Andrew Saxe -
2021 Poster: Multi-Task Reinforcement Learning with Context-based Representations »
Shagun Sodhani · Amy Zhang · Joelle Pineau -
2021 Spotlight: Multi-Task Reinforcement Learning with Context-based Representations »
Shagun Sodhani · Amy Zhang · Joelle Pineau -
2020 : Concluding Remarks »
Sarath Chandar · Shagun Sodhani -
2020 : Q&A by Rich Sutton »
Richard Sutton · Shagun Sodhani · Sarath Chandar -
2020 : Q&A with Irina Rish »
Irina Rish · Shagun Sodhani · Sarath Chandar -
2020 : Q&A with Jürgen Schmidhuber »
Jürgen Schmidhuber · Shagun Sodhani · Sarath Chandar -
2020 : Q&A with Partha Pratim Talukdar »
Partha Talukdar · Shagun Sodhani · Sarath Chandar -
2020 : Q&A with Katja Hoffman »
Katja Hofmann · Luisa Zintgraf · Rika Antonova · Sarath Chandar · Shagun Sodhani -
2020 Workshop: 4th Lifelong Learning Workshop »
Shagun Sodhani · Sarath Chandar · Balaraman Ravindran · Doina Precup -
2020 : Opening Comments »
Sarath Chandar · Shagun Sodhani -
2020 Poster: Invariant Causal Prediction for Block MDPs »
Amy Zhang · Clare Lyle · Shagun Sodhani · Angelos Filos · Marta Kwiatkowska · Joelle Pineau · Yarin Gal · Doina Precup -
2019 : Andrew Saxe: Intriguing phenomena in training and generalization dynamics of deep networks »
Andrew Saxe -
2017 Poster: Hierarchy Through Composition with Multitask LMDPs »
Andrew Saxe · Adam Earle · Benjamin Rosman -
2017 Talk: Hierarchy Through Composition with Multitask LMDPs »
Andrew Saxe · Adam Earle · Benjamin Rosman