Timezone: »
We introduce Mix and match (M&M) -- a training framework designed to facilitate rapid and effective learning in RL agents that would be too slow or too challenging to train otherwise.The key innovation is a procedure that allows us to automatically form a curriculum over agents. Through such a curriculum we can progressively train more complex agents by, effectively, bootstrapping from solutions found by simpler agents.In contradistinction to typical curriculum learning approaches, we do not gradually modify the tasks or environments presented, but instead use a process to gradually alter how the policy is represented internally.We show the broad applicability of our method by demonstrating significant performance gains in three different experimental setups: (1) We train an agent able to control more than 700 actions in a challenging 3D first-person task; using our method to progress through an action-space curriculum we achieve both faster training and better final performance than one obtains using traditional methods.(2) We further show that M&M can be used successfully to progress through a curriculum of architectural variants defining an agents internal state. (3) Finally, we illustrate how a variant of our method can be used to improve agent performance in a multitask setting.
Author Information
Wojciech Czarnecki (DeepMind)
Siddhant Jayakumar (DeepMind)
Max Jaderberg (DeepMind)
Leonard Hasenclever (DeepMind)
Yee Teh (DeepMind)
Nicolas Heess (DeepMind)
Simon Osindero (DeepMind)
Razvan Pascanu (DeepMind)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Mix & Match - Agent Curricula for Reinforcement Learning »
Fri. Jul 13th 04:15 -- 07:00 PM Room Hall B #13
More from the Same Authors
-
2022 : Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet? »
Nenad Tomasev · Ioana Bica · Brian McWilliams · Lars Buesing · Razvan Pascanu · Charles Blundell · Jovana Mitrovic -
2023 Poster: Resurrecting Recurrent Neural Networks for Long Sequences »
Antonio Orvieto · Samuel Smith · Albert Gu · Anushan Fernando · Caglar Gulchere · Razvan Pascanu · Soham De -
2023 Poster: Understanding Plasticity in Neural Networks »
Clare Lyle · Zeyu Zheng · Evgenii Nikishin · Bernardo Avila Pires · Razvan Pascanu · Will Dabney -
2023 Oral: Resurrecting Recurrent Neural Networks for Long Sequences »
Antonio Orvieto · Samuel Smith · Albert Gu · Anushan Fernando · Caglar Gulchere · Razvan Pascanu · Soham De -
2023 Oral: Understanding Plasticity in Neural Networks »
Clare Lyle · Zeyu Zheng · Evgenii Nikishin · Bernardo Avila Pires · Razvan Pascanu · Will Dabney -
2022 Poster: Wide Neural Networks Forget Less Catastrophically »
Seyed Iman Mirzadeh · Arslan Chaudhry · Dong Yin · Huiyi Hu · Razvan Pascanu · Dilan Gorur · Mehrdad Farajtabar -
2022 Poster: Retrieval-Augmented Reinforcement Learning »
Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · Adrià Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell -
2022 Spotlight: Retrieval-Augmented Reinforcement Learning »
Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · Adrià Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell -
2022 Spotlight: Wide Neural Networks Forget Less Catastrophically »
Seyed Iman Mirzadeh · Arslan Chaudhry · Dong Yin · Huiyi Hu · Razvan Pascanu · Dilan Gorur · Mehrdad Farajtabar -
2022 Poster: The CLRS Algorithmic Reasoning Benchmark »
Petar Veličković · Adrià Puigdomenech Badia · David Budden · Razvan Pascanu · Andrea Banino · Misha Dashevskiy · Raia Hadsell · Charles Blundell -
2022 Poster: Model-Value Inconsistency as a Signal for Epistemic Uncertainty »
Angelos Filos · Eszter Vértes · Zita Marinho · Gregory Farquhar · Diana Borsa · Abe Friesen · Feryal Behbahani · Tom Schaul · Andre Barreto · Simon Osindero -
2022 Spotlight: The CLRS Algorithmic Reasoning Benchmark »
Petar Veličković · Adrià Puigdomenech Badia · David Budden · Razvan Pascanu · Andrea Banino · Misha Dashevskiy · Raia Hadsell · Charles Blundell -
2022 Spotlight: Model-Value Inconsistency as a Signal for Epistemic Uncertainty »
Angelos Filos · Eszter Vértes · Zita Marinho · Gregory Farquhar · Diana Borsa · Abe Friesen · Feryal Behbahani · Tom Schaul · Andre Barreto · Simon Osindero -
2022 Poster: Improving Language Models by Retrieving from Trillions of Tokens »
Sebastian Borgeaud · Arthur Mensch · Jordan Hoffmann · Trevor Cai · Eliza Rutherford · Katie Millican · George van den Driessche · Jean-Baptiste Lespiau · Bogdan Damoc · Aidan Clark · Diego de Las Casas · Aurelia Guy · Jacob Menick · Roman Ring · Tom Hennigan · Saffron Huang · Loren Maggiore · Chris Jones · Albin Cassirer · Andy Brock · Michela Paganini · Geoffrey Irving · Oriol Vinyals · Simon Osindero · Karen Simonyan · Jack Rae · Erich Elsen · Laurent Sifre -
2022 Poster: Unified Scaling Laws for Routed Language Models »
Aidan Clark · Diego de Las Casas · Aurelia Guy · Arthur Mensch · Michela Paganini · Jordan Hoffmann · Bogdan Damoc · Blake Hechtman · Trevor Cai · Sebastian Borgeaud · George van den Driessche · Eliza Rutherford · Tom Hennigan · Matthew Johnson · Albin Cassirer · Chris Jones · Elena Buchatskaya · David Budden · Laurent Sifre · Simon Osindero · Oriol Vinyals · Marc'Aurelio Ranzato · Jack Rae · Erich Elsen · Koray Kavukcuoglu · Karen Simonyan -
2022 Poster: Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games »
Siqi Liu · Marc Lanctot · Luke Marris · Nicolas Heess -
2022 Spotlight: Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games »
Siqi Liu · Marc Lanctot · Luke Marris · Nicolas Heess -
2022 Spotlight: Improving Language Models by Retrieving from Trillions of Tokens »
Sebastian Borgeaud · Arthur Mensch · Jordan Hoffmann · Trevor Cai · Eliza Rutherford · Katie Millican · George van den Driessche · Jean-Baptiste Lespiau · Bogdan Damoc · Aidan Clark · Diego de Las Casas · Aurelia Guy · Jacob Menick · Roman Ring · Tom Hennigan · Saffron Huang · Loren Maggiore · Chris Jones · Albin Cassirer · Andy Brock · Michela Paganini · Geoffrey Irving · Oriol Vinyals · Simon Osindero · Karen Simonyan · Jack Rae · Erich Elsen · Laurent Sifre -
2022 Oral: Unified Scaling Laws for Routed Language Models »
Aidan Clark · Diego de Las Casas · Aurelia Guy · Arthur Mensch · Michela Paganini · Jordan Hoffmann · Bogdan Damoc · Blake Hechtman · Trevor Cai · Sebastian Borgeaud · George van den Driessche · Eliza Rutherford · Tom Hennigan · Matthew Johnson · Albin Cassirer · Chris Jones · Elena Buchatskaya · David Budden · Laurent Sifre · Simon Osindero · Oriol Vinyals · Marc'Aurelio Ranzato · Jack Rae · Erich Elsen · Koray Kavukcuoglu · Karen Simonyan -
2021 : Invited Talk #4 »
Razvan Pascanu -
2021 : Panel Discussion1 »
Razvan Pascanu · Irina Rish -
2021 Test Of Time: Bayesian Learning via Stochastic Gradient Langevin Dynamics »
Yee Teh · Max Welling -
2021 Poster: Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective »
Florin Gogianu · Tudor Berariu · Mihaela Rosca · Claudia Clopath · Lucian Busoniu · Razvan Pascanu -
2021 Poster: Data-efficient Hindsight Off-policy Option Learning »
Markus Wulfmeier · Dushyant Rao · Roland Hafner · Thomas Lampe · Abbas Abdolmaleki · Tim Hertweck · Michael Neunert · Dhruva Tirumala Bukkapatnam · Noah Siegel · Nicolas Heess · Martin Riedmiller -
2021 Spotlight: Data-efficient Hindsight Off-policy Option Learning »
Markus Wulfmeier · Dushyant Rao · Roland Hafner · Thomas Lampe · Abbas Abdolmaleki · Tim Hertweck · Michael Neunert · Dhruva Tirumala Bukkapatnam · Noah Siegel · Nicolas Heess · Martin Riedmiller -
2021 Spotlight: Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective »
Florin Gogianu · Tudor Berariu · Mihaela Rosca · Claudia Clopath · Lucian Busoniu · Razvan Pascanu -
2021 Poster: Counterfactual Credit Assignment in Model-Free Reinforcement Learning »
Thomas Mesnard · Theophane Weber · Fabio Viola · Shantanu Thakoor · Alaa Saade · Anna Harutyunyan · Will Dabney · Thomas Stepleton · Nicolas Heess · Arthur Guez · Eric Moulines · Marcus Hutter · Lars Buesing · Remi Munos -
2021 Spotlight: Counterfactual Credit Assignment in Model-Free Reinforcement Learning »
Thomas Mesnard · Theophane Weber · Fabio Viola · Shantanu Thakoor · Alaa Saade · Anna Harutyunyan · Will Dabney · Thomas Stepleton · Nicolas Heess · Arthur Guez · Eric Moulines · Marcus Hutter · Lars Buesing · Remi Munos -
2020 : QA for invited talk 6 Heess »
Nicolas Heess -
2020 : Invited talk 6 Heess »
Nicolas Heess -
2020 : Open-ended environments for advancing RL Q&A »
Max Jaderberg · Katja Hofmann -
2020 : Open-ended environments for advancing RL »
Max Jaderberg -
2020 : Invited Talk: Razvan Pascanu "Continual Learning from an Optimization/Learning-dynamics perspective" »
Razvan Pascanu -
2020 Workshop: Workshop on Continual Learning »
Haytham Fayek · Arslan Chaudhry · David Lopez-Paz · Eugene Belilovsky · Jonathan Richard Schwarz · Marc Pickett · Rahaf Aljundi · Sayna Ebrahimi · Razvan Pascanu · Puneet Dokania -
2020 Poster: Small Data, Big Decisions: Model Selection in the Small-Data Regime »
Jorg Bornschein · Francesco Visin · Simon Osindero -
2020 Poster: CoMic: Complementary Task Learning & Mimicry for Reusable Skills »
Leonard Hasenclever · Fabio Pardo · Raia Hadsell · Nicolas Heess · Josh Merel -
2020 Poster: Stabilizing Transformers for Reinforcement Learning »
Emilio Parisotto · Francis Song · Jack Rae · Razvan Pascanu · Caglar Gulcehre · Siddhant Jayakumar · Max Jaderberg · Raphael Lopez Kaufman · Aidan Clark · Seb Noury · Matthew Botvinick · Nicolas Heess · Raia Hadsell -
2020 Poster: A distributional view on multi-objective policy optimization »
Abbas Abdolmaleki · Sandy Huang · Leonard Hasenclever · Michael Neunert · Francis Song · Martina Zambelli · Murilo Martins · Nicolas Heess · Raia Hadsell · Martin Riedmiller -
2020 Poster: Improving the Gating Mechanism of Recurrent Neural Networks »
Albert Gu · Caglar Gulcehre · Thomas Paine · Matthew Hoffman · Razvan Pascanu -
2019 : Nicolas Heess: TBD »
Nicolas Heess -
2019 Poster: Composing Entropic Policies using Divergence Correction »
Jonathan Hunt · Andre Barreto · Timothy Lillicrap · Nicolas Heess -
2019 Poster: Open-ended learning in symmetric zero-sum games »
David Balduzzi · Marta Garnelo · Yoram Bachrach · Wojciech Czarnecki · Julien Perolat · Max Jaderberg · Thore Graepel -
2019 Oral: Open-ended learning in symmetric zero-sum games »
David Balduzzi · Marta Garnelo · Yoram Bachrach · Wojciech Czarnecki · Julien Perolat · Max Jaderberg · Thore Graepel -
2019 Oral: Composing Entropic Policies using Divergence Correction »
Jonathan Hunt · Andre Barreto · Timothy Lillicrap · Nicolas Heess -
2018 Poster: Progress & Compress: A scalable framework for continual learning »
Jonathan Richard Schwarz · Wojciech Czarnecki · Jelena Luketina · Agnieszka Grabska-Barwinska · Yee Teh · Razvan Pascanu · Raia Hadsell -
2018 Oral: Progress & Compress: A scalable framework for continual learning »
Jonathan Richard Schwarz · Wojciech Czarnecki · Jelena Luketina · Agnieszka Grabska-Barwinska · Yee Teh · Razvan Pascanu · Raia Hadsell -
2018 Poster: Learning by Playing - Solving Sparse Reward Tasks from Scratch »
Martin Riedmiller · Roland Hafner · Thomas Lampe · Michael Neunert · Jonas Degrave · Tom Van de Wiele · Vlad Mnih · Nicolas Heess · Jost Springenberg -
2018 Poster: Graph Networks as Learnable Physics Engines for Inference and Control »
Alvaro Sanchez-Gonzalez · Nicolas Heess · Jost Springenberg · Josh Merel · Martin Riedmiller · Raia Hadsell · Peter Battaglia -
2018 Poster: Been There, Done That: Meta-Learning with Episodic Recall »
Samuel Ritter · Jane Wang · Zeb Kurth-Nelson · Siddhant Jayakumar · Charles Blundell · Razvan Pascanu · Matthew Botvinick -
2018 Poster: Conditional Neural Processes »
Marta Garnelo · Dan Rosenbaum · Chris Maddison · Tiago Ramalho · David Saxton · Murray Shanahan · Yee Teh · Danilo J. Rezende · S. M. Ali Eslami -
2018 Oral: Been There, Done That: Meta-Learning with Episodic Recall »
Samuel Ritter · Jane Wang · Zeb Kurth-Nelson · Siddhant Jayakumar · Charles Blundell · Razvan Pascanu · Matthew Botvinick -
2018 Oral: Learning by Playing - Solving Sparse Reward Tasks from Scratch »
Martin Riedmiller · Roland Hafner · Thomas Lampe · Michael Neunert · Jonas Degrave · Tom Van de Wiele · Vlad Mnih · Nicolas Heess · Jost Springenberg -
2018 Oral: Conditional Neural Processes »
Marta Garnelo · Dan Rosenbaum · Chris Maddison · Tiago Ramalho · David Saxton · Murray Shanahan · Yee Teh · Danilo J. Rezende · S. M. Ali Eslami -
2018 Oral: Graph Networks as Learnable Physics Engines for Inference and Control »
Alvaro Sanchez-Gonzalez · Nicolas Heess · Jost Springenberg · Josh Merel · Martin Riedmiller · Raia Hadsell · Peter Battaglia -
2017 Poster: FeUdal Networks for Hierarchical Reinforcement Learning »
Alexander Vezhnevets · Simon Osindero · Tom Schaul · Nicolas Heess · Max Jaderberg · David Silver · Koray Kavukcuoglu -
2017 Talk: FeUdal Networks for Hierarchical Reinforcement Learning »
Alexander Vezhnevets · Simon Osindero · Tom Schaul · Nicolas Heess · Max Jaderberg · David Silver · Koray Kavukcuoglu -
2017 Poster: Sharp Minima Can Generalize For Deep Nets »
Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio -
2017 Poster: Decoupled Neural Interfaces using Synthetic Gradients »
Max Jaderberg · Wojciech Czarnecki · Simon Osindero · Oriol Vinyals · Alex Graves · David Silver · Koray Kavukcuoglu -
2017 Poster: Understanding Synthetic Gradients and Decoupled Neural Interfaces »
Wojciech Czarnecki · Grzegorz Świrszcz · Max Jaderberg · Simon Osindero · Oriol Vinyals · Koray Kavukcuoglu -
2017 Talk: Sharp Minima Can Generalize For Deep Nets »
Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio -
2017 Talk: Understanding Synthetic Gradients and Decoupled Neural Interfaces »
Wojciech Czarnecki · Grzegorz Świrszcz · Max Jaderberg · Simon Osindero · Oriol Vinyals · Koray Kavukcuoglu -
2017 Talk: Decoupled Neural Interfaces using Synthetic Gradients »
Max Jaderberg · Wojciech Czarnecki · Simon Osindero · Oriol Vinyals · Alex Graves · David Silver · Koray Kavukcuoglu