Timezone: »
Gating mechanisms are widely used in neural network models, where they allow gradients to backpropagate easily through depth or time. However, their saturation property introduces problems of its own. For example, in recurrent models these gates need to have outputs near 1 to propagate information over long time-delays, which requires them to operate in their saturation regime and hinders gradient-based learning of the gate mechanism. We address this problem by deriving two synergistic modifications to the standard gating mechanism that are easy to implement, introduce no additional hyperparameters, and improve learnability of the gates when they are close to saturation. We show how these changes are related to and improve on alternative recently proposed gating mechanisms such as chrono-initialization and Ordered Neurons. Empirically, our simple gating mechanisms robustly improve the performance of recurrent models on a range of applications, including synthetic memorization tasks, sequential image classification, language modeling, and reinforcement learning, particularly when long-term dependencies are involved.
Author Information
Albert Gu (Stanford University)
Caglar Gulcehre (DeepMind)
Thomas Paine (DeepMind)
Matthew Hoffman (DeepMind)
Razvan Pascanu (DeepMind)
More from the Same Authors
-
2022 : Towards a General Purpose CNN for Long Range Dependencies in $N$D »
David Romero · David Knigge · Albert Gu · Erik Bekkers · Efstratios Gavves · Jakub Tomczak · Mark Hoogendoorn -
2022 : Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet? »
Nenad Tomasev · Ioana Bica · Brian McWilliams · Lars Buesing · Razvan Pascanu · Charles Blundell · Jovana Mitrovic -
2023 : On the Universality of Linear Recurrences Followed by Nonlinear Projections »
Antonio Orvieto · Soham De · Razvan Pascanu · Caglar Gulcehre · Samuel Smith -
2023 : Structured State Space Models for In-Context Reinforcement Learning »
Christopher Lu · Yannick Schroecker · Albert Gu · Emilio Parisotto · Jakob Foerster · Satinder Singh · Feryal Behbahani -
2023 : Latent Space Representations of Neural Algorithmic Reasoners »
Vladimir V. Mirjanić · Razvan Pascanu · Petar Veličković -
2023 : Asynchronous Algorithmic Alignment with Cocycles »
Andrew Dudzik · Tamara von Glehn · Razvan Pascanu · Petar Veličković -
2023 : Asynchronous Algorithmic Alignment with Cocycles »
Andrew Dudzik · Tamara von Glehn · Razvan Pascanu · Petar Veličković -
2023 Oral: Resurrecting Recurrent Neural Networks for Long Sequences »
Antonio Orvieto · Samuel Smith · Albert Gu · Anushan Fernando · Caglar Gulcehre · Razvan Pascanu · Soham De -
2023 Oral: Understanding Plasticity in Neural Networks »
Clare Lyle · Zeyu Zheng · Evgenii Nikishin · Bernardo Avila Pires · Razvan Pascanu · Will Dabney -
2023 Poster: Understanding Plasticity in Neural Networks »
Clare Lyle · Zeyu Zheng · Evgenii Nikishin · Bernardo Avila Pires · Razvan Pascanu · Will Dabney -
2023 Poster: Resurrecting Recurrent Neural Networks for Long Sequences »
Antonio Orvieto · Samuel Smith · Albert Gu · Anushan Fernando · Caglar Gulcehre · Razvan Pascanu · Soham De -
2022 Poster: Wide Neural Networks Forget Less Catastrophically »
Seyed Iman Mirzadeh · Arslan Chaudhry · Dong Yin · Huiyi Hu · Razvan Pascanu · Dilan Gorur · Mehrdad Farajtabar -
2022 Poster: It’s Raw! Audio Generation with State-Space Models »
Karan Goel · Albert Gu · Chris Donahue · Christopher Re -
2022 Spotlight: Wide Neural Networks Forget Less Catastrophically »
Seyed Iman Mirzadeh · Arslan Chaudhry · Dong Yin · Huiyi Hu · Razvan Pascanu · Dilan Gorur · Mehrdad Farajtabar -
2022 Oral: It’s Raw! Audio Generation with State-Space Models »
Karan Goel · Albert Gu · Chris Donahue · Christopher Re -
2022 Poster: The CLRS Algorithmic Reasoning Benchmark »
Petar Veličković · Adrià Puigdomenech Badia · David Budden · Razvan Pascanu · Andrea Banino · Misha Dashevskiy · Raia Hadsell · Charles Blundell -
2022 Spotlight: The CLRS Algorithmic Reasoning Benchmark »
Petar Veličković · Adrià Puigdomenech Badia · David Budden · Razvan Pascanu · Andrea Banino · Misha Dashevskiy · Raia Hadsell · Charles Blundell -
2021 : Invited Talk #4 »
Razvan Pascanu -
2021 : Panel Discussion1 »
Razvan Pascanu · Irina Rish -
2021 Poster: HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections »
Ines Chami · Albert Gu · Dat P Nguyen · Christopher Re -
2021 Spotlight: HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections »
Ines Chami · Albert Gu · Dat P Nguyen · Christopher Re -
2021 Poster: Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective »
Florin Gogianu · Tudor Berariu · Mihaela Rosca · Claudia Clopath · Lucian Busoniu · Razvan Pascanu -
2021 Spotlight: Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective »
Florin Gogianu · Tudor Berariu · Mihaela Rosca · Claudia Clopath · Lucian Busoniu · Razvan Pascanu -
2021 Poster: Catformer: Designing Stable Transformers via Sensitivity Analysis »
Jared Quincy Davis · Albert Gu · Krzysztof Choromanski · Tri Dao · Christopher Re · Chelsea Finn · Percy Liang -
2021 Spotlight: Catformer: Designing Stable Transformers via Sensitivity Analysis »
Jared Quincy Davis · Albert Gu · Krzysztof Choromanski · Tri Dao · Christopher Re · Chelsea Finn · Percy Liang -
2020 : Invited Talk: Razvan Pascanu "Continual Learning from an Optimization/Learning-dynamics perspective" »
Razvan Pascanu -
2020 Workshop: Workshop on Continual Learning »
Haytham Fayek · Arslan Chaudhry · David Lopez-Paz · Eugene Belilovsky · Jonathan Richard Schwarz · Marc Pickett · Rahaf Aljundi · Sayna Ebrahimi · Razvan Pascanu · Puneet Dokania -
2020 Poster: Stabilizing Transformers for Reinforcement Learning »
Emilio Parisotto · Francis Song · Jack Rae · Razvan Pascanu · Caglar Gulcehre · Siddhant Jayakumar · Max Jaderberg · Raphael Lopez Kaufman · Aidan Clark · Seb Noury · Matthew Botvinick · Nicolas Heess · Raia Hadsell -
2019 Poster: Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations »
Tri Dao · Albert Gu · Matthew Eichhorn · Atri Rudra · Christopher Re -
2019 Oral: Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations »
Tri Dao · Albert Gu · Matthew Eichhorn · Atri Rudra · Christopher Re -
2019 Poster: Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning »
Natasha Jaques · Angeliki Lazaridou · Edward Hughes · Caglar Gulcehre · Pedro Ortega · DJ Strouse · Joel Z Leibo · Nando de Freitas -
2019 Oral: Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning »
Natasha Jaques · Angeliki Lazaridou · Edward Hughes · Caglar Gulcehre · Pedro Ortega · DJ Strouse · Joel Z Leibo · Nando de Freitas -
2019 Poster: A Kernel Theory of Modern Data Augmentation »
Tri Dao · Albert Gu · Alexander J Ratner · Virginia Smith · Christopher De Sa · Christopher Re -
2019 Oral: A Kernel Theory of Modern Data Augmentation »
Tri Dao · Albert Gu · Alexander J Ratner · Virginia Smith · Christopher De Sa · Christopher Re -
2018 Poster: Progress & Compress: A scalable framework for continual learning »
Jonathan Richard Schwarz · Wojciech Czarnecki · Jelena Luketina · Agnieszka Grabska-Barwinska · Yee Teh · Razvan Pascanu · Raia Hadsell -
2018 Poster: Mix & Match - Agent Curricula for Reinforcement Learning »
Wojciech Czarnecki · Siddhant Jayakumar · Max Jaderberg · Leonard Hasenclever · Yee Teh · Nicolas Heess · Simon Osindero · Razvan Pascanu -
2018 Oral: Progress & Compress: A scalable framework for continual learning »
Jonathan Richard Schwarz · Wojciech Czarnecki · Jelena Luketina · Agnieszka Grabska-Barwinska · Yee Teh · Razvan Pascanu · Raia Hadsell -
2018 Oral: Mix & Match - Agent Curricula for Reinforcement Learning »
Wojciech Czarnecki · Siddhant Jayakumar · Max Jaderberg · Leonard Hasenclever · Yee Teh · Nicolas Heess · Simon Osindero · Razvan Pascanu -
2018 Poster: Representation Tradeoffs for Hyperbolic Embeddings »
Frederic Sala · Christopher De Sa · Albert Gu · Christopher Re -
2018 Oral: Representation Tradeoffs for Hyperbolic Embeddings »
Frederic Sala · Christopher De Sa · Albert Gu · Christopher Re -
2018 Poster: Been There, Done That: Meta-Learning with Episodic Recall »
Samuel Ritter · Jane Wang · Zeb Kurth-Nelson · Siddhant Jayakumar · Charles Blundell · Razvan Pascanu · Matthew Botvinick -
2018 Oral: Been There, Done That: Meta-Learning with Episodic Recall »
Samuel Ritter · Jane Wang · Zeb Kurth-Nelson · Siddhant Jayakumar · Charles Blundell · Razvan Pascanu · Matthew Botvinick -
2017 Poster: Sharp Minima Can Generalize For Deep Nets »
Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio -
2017 Poster: Learning to Learn without Gradient Descent by Gradient Descent »
Yutian Chen · Matthew Hoffman · Sergio Gómez Colmenarejo · Misha Denil · Timothy Lillicrap · Matthew Botvinick · Nando de Freitas -
2017 Poster: Learned Optimizers that Scale and Generalize »
Olga Wichrowska · Niru Maheswaranathan · Matthew Hoffman · Sergio Gómez Colmenarejo · Misha Denil · Nando de Freitas · Jascha Sohl-Dickstein -
2017 Talk: Learned Optimizers that Scale and Generalize »
Olga Wichrowska · Niru Maheswaranathan · Matthew Hoffman · Sergio Gómez Colmenarejo · Misha Denil · Nando de Freitas · Jascha Sohl-Dickstein -
2017 Talk: Learning to Learn without Gradient Descent by Gradient Descent »
Yutian Chen · Matthew Hoffman · Sergio Gómez Colmenarejo · Misha Denil · Timothy Lillicrap · Matthew Botvinick · Nando de Freitas -
2017 Talk: Sharp Minima Can Generalize For Deep Nets »
Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio