Timezone: »

 
Oral
Fast Parametric Learning with Activation Memorization
Jack Rae · Chris Dyer · Peter Dayan · Timothy Lillicrap

Wed Jul 11 07:40 AM -- 07:50 AM (PDT) @ Victoria

Neural networks trained with backpropagation often struggle to identify classes that have been observed a small number of times. In applications where most class labels are rare, such as language modelling, this can become a performance bottleneck. One potential remedy is to augment the network with a fast-learning non-parametric model which stores recent activations and class labels into an external memory. We explore a simplified architecture where we treat a subset of the model parameters as fast memory stores. This can help retain information over longer time intervals than a traditional memory, and does not require additional space or compute. In the case of image classification, we display faster binding of novel classes on an Omniglot image curriculum task. We also show improved performance for word-based language models on news reports (GigaWord), books (Project Gutenberg) and Wikipedia articles (WikiText-103) - the latter achieving a state-of-the-art perplexity of 29.2.

Author Information

Jack Rae (DeepMind)
Chris Dyer (DeepMind)
Peter Dayan (UCL)
Timothy Lillicrap (Google DeepMind)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors

  • 2022 Poster: Retrieval-Augmented Reinforcement Learning »
    Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · Adrià Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell
  • 2022 Spotlight: Retrieval-Augmented Reinforcement Learning »
    Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · Adrià Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell
  • 2022 Poster: A data-driven approach for learning to control computers »
    Peter Humphreys · David Raposo · Tobias Pohlen · Gregory Thornton · Rachita Chhaparia · Alistair Muldal · Josh Abramson · Petko Georgiev · Adam Santoro · Timothy Lillicrap
  • 2022 Spotlight: A data-driven approach for learning to control computers »
    Peter Humphreys · David Raposo · Tobias Pohlen · Gregory Thornton · Rachita Chhaparia · Alistair Muldal · Josh Abramson · Petko Georgiev · Adam Santoro · Timothy Lillicrap
  • 2022 Poster: Improving Language Models by Retrieving from Trillions of Tokens »
    Sebastian Borgeaud · Arthur Mensch · Jordan Hoffmann · Trevor Cai · Eliza Rutherford · Katie Millican · George van den Driessche · Jean-Baptiste Lespiau · Bogdan Damoc · Aidan Clark · Diego de Las Casas · Aurelia Guy · Jacob Menick · Roman Ring · Tom Hennigan · Saffron Huang · Loren Maggiore · Chris Jones · Albin Cassirer · Andy Brock · Michela Paganini · Geoffrey Irving · Oriol Vinyals · Simon Osindero · Karen Simonyan · Jack Rae · Erich Elsen · Laurent Sifre
  • 2022 Poster: Unified Scaling Laws for Routed Language Models »
    Aidan Clark · Diego de Las Casas · Aurelia Guy · Arthur Mensch · Michela Paganini · Jordan Hoffmann · Bogdan Damoc · Blake Hechtman · Trevor Cai · Sebastian Borgeaud · George van den Driessche · Eliza Rutherford · Tom Hennigan · Matthew Johnson · Albin Cassirer · Chris Jones · Elena Buchatskaya · David Budden · Laurent Sifre · Simon Osindero · Oriol Vinyals · Marc'Aurelio Ranzato · Jack Rae · Erich Elsen · Koray Kavukcuoglu · Karen Simonyan
  • 2022 Spotlight: Improving Language Models by Retrieving from Trillions of Tokens »
    Sebastian Borgeaud · Arthur Mensch · Jordan Hoffmann · Trevor Cai · Eliza Rutherford · Katie Millican · George van den Driessche · Jean-Baptiste Lespiau · Bogdan Damoc · Aidan Clark · Diego de Las Casas · Aurelia Guy · Jacob Menick · Roman Ring · Tom Hennigan · Saffron Huang · Loren Maggiore · Chris Jones · Albin Cassirer · Andy Brock · Michela Paganini · Geoffrey Irving · Oriol Vinyals · Simon Osindero · Karen Simonyan · Jack Rae · Erich Elsen · Laurent Sifre
  • 2022 Oral: Unified Scaling Laws for Routed Language Models »
    Aidan Clark · Diego de Las Casas · Aurelia Guy · Arthur Mensch · Michela Paganini · Jordan Hoffmann · Bogdan Damoc · Blake Hechtman · Trevor Cai · Sebastian Borgeaud · George van den Driessche · Eliza Rutherford · Tom Hennigan · Matthew Johnson · Albin Cassirer · Chris Jones · Elena Buchatskaya · David Budden · Laurent Sifre · Simon Osindero · Oriol Vinyals · Marc'Aurelio Ranzato · Jack Rae · Erich Elsen · Koray Kavukcuoglu · Karen Simonyan
  • 2020 Poster: Stabilizing Transformers for Reinforcement Learning »
    Emilio Parisotto · Francis Song · Jack Rae · Razvan Pascanu · Caglar Gulcehre · Siddhant Jayakumar · Max Jaderberg · Raphael Lopez Kaufman · Aidan Clark · Seb Noury · Matthew Botvinick · Nicolas Heess · Raia Hadsell
  • 2019 Poster: Learning Latent Dynamics for Planning from Pixels »
    Danijar Hafner · Timothy Lillicrap · Ian Fischer · Ruben Villegas · David Ha · Honglak Lee · James Davidson
  • 2019 Poster: Meta-Learning Neural Bloom Filters »
    Jack Rae · Sergey Bartunov · Timothy Lillicrap
  • 2019 Oral: Meta-Learning Neural Bloom Filters »
    Jack Rae · Sergey Bartunov · Timothy Lillicrap
  • 2019 Oral: Learning Latent Dynamics for Planning from Pixels »
    Danijar Hafner · Timothy Lillicrap · Ian Fischer · Ruben Villegas · David Ha · Honglak Lee · James Davidson
  • 2019 Poster: Deep Compressed Sensing »
    Yan Wu · Mihaela Rosca · Timothy Lillicrap
  • 2019 Oral: Deep Compressed Sensing »
    Yan Wu · Mihaela Rosca · Timothy Lillicrap
  • 2019 Poster: Composing Entropic Policies using Divergence Correction »
    Jonathan Hunt · Andre Barreto · Timothy Lillicrap · Nicolas Heess
  • 2019 Poster: An Investigation of Model-Free Planning »
    Arthur Guez · Mehdi Mirza · Karol Gregor · Rishabh Kabra · Sebastien Racaniere · Theophane Weber · David Raposo · Adam Santoro · Laurent Orseau · Tom Eccles · Greg Wayne · David Silver · Timothy Lillicrap
  • 2019 Oral: An Investigation of Model-Free Planning »
    Arthur Guez · Mehdi Mirza · Karol Gregor · Rishabh Kabra · Sebastien Racaniere · Theophane Weber · David Raposo · Adam Santoro · Laurent Orseau · Tom Eccles · Greg Wayne · David Silver · Timothy Lillicrap
  • 2019 Oral: Composing Entropic Policies using Divergence Correction »
    Jonathan Hunt · Andre Barreto · Timothy Lillicrap · Nicolas Heess
  • 2018 Poster: Measuring abstract reasoning in neural networks »
    Adam Santoro · Feilx Hill · David GT Barrett · Ari S Morcos · Timothy Lillicrap
  • 2018 Oral: Measuring abstract reasoning in neural networks »
    Adam Santoro · Feilx Hill · David GT Barrett · Ari S Morcos · Timothy Lillicrap
  • 2017 Workshop: Learning to Generate Natural Language »
    Yishu Miao · Wang Ling · Tsung-Hsien Wen · Kris Cao · Daniela Gerz · Phil Blunsom · Chris Dyer
  • 2017 Poster: Learning to Learn without Gradient Descent by Gradient Descent »
    Yutian Chen · Matthew Hoffman · Sergio Gómez Colmenarejo · Misha Denil · Timothy Lillicrap · Matthew Botvinick · Nando de Freitas
  • 2017 Talk: Learning to Learn without Gradient Descent by Gradient Descent »
    Yutian Chen · Matthew Hoffman · Sergio Gómez Colmenarejo · Misha Denil · Timothy Lillicrap · Matthew Botvinick · Nando de Freitas