Timezone: »
Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a vast space of held-out environment dynamics, our adaptive agent (AdA) displays on-the-fly hypothesis-driven exploration, efficient exploitation of acquired knowledge, and can successfully be prompted with first-person demonstrations. Adaptation emerges from three ingredients: (1) meta-reinforcement learning across a vast, smooth and diverse task distribution, (2) a policy parameterised as a large-scale attention-based memory architecture, and (3) an effective automated curriculum that prioritises tasks at the frontier of an agent's capabilities. We demonstrate characteristic scaling laws with respect to network size, memory length, and richness of the training task distribution. We believe our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains.
Author Information
Jakob Bauer (DeepMind)
Kate Baumli (Google DeepMind)
Feryal Behbahani (Google DeepMind)
Avishkar Bhoopchand (DeepMind)
Natalie Bradley-Schmieg
Michael Chang
Natalie Clay (DeepMind)
Adrian Collister
Vibhavari Dasagi (DeepMind)
Lucy Gonzalez
Karol Gregor (Google)
Edward Hughes (DeepMind)
Sheleem Kashem
Maria Loks-Thompson
Hannah Openshaw
Jack Parker-Holder (DeepMind)
Shreya Pathak
Nicolas Perez-Nieves (Imperial College London)
Nemanja Rakicevic (DeepMind)
Tim Rocktäschel (Facebook AI Research & University College London)
Yannick Schroecker (DeepMind)
Satinder Singh (DeepMind)
Jakub Sygnowski (Google)
Karl Tuyls (DeepMind)
Sarah York
Alexander Zacherl (DeepMind)
Lei Zhang (Google DeepMind)
Related Events (a corresponding poster, oral, or spotlight)
-
2023 Poster: Human-Timescale Adaptation in an Open-Ended Task Space »
Wed. Jul 26th 12:00 -- 01:30 AM Room Exhibit Hall 1 #811
More from the Same Authors
-
2021 : Discovering Diverse Nearly Optimal Policies with Successor Features »
Tom Zahavy · Brendan O'Donoghue · Andre Barreto · Sebastian Flennerhag · Vlad Mnih · Satinder Singh -
2021 : Reward is enough for convex MDPs »
Tom Zahavy · Brendan O'Donoghue · Guillaume Desjardins · Satinder Singh -
2023 : Structured State Space Models for In-Context Reinforcement Learning »
Christopher Lu · Yannick Schroecker · Albert Gu · Emilio Parisotto · Jakob Foerster · Satinder Singh · Feryal Behbahani -
2023 : Synthetic Experience Replay »
Cong Lu · Philip Ball · Yee-Whye Teh · Jack Parker-Holder -
2023 : Do LLMs selectively encode the goal of an agent's reach? »
Laura Ruis · Arduin Findeis · Herbie Bradley · Hossein A. Rahmani · Kyoung Whan Choe · Edward Grefenstette · Tim Rocktäschel -
2023 Poster: ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs »
Ted Moskovitz · Brendan O'Donoghue · Vivek Veeriah · Sebastian Flennerhag · Satinder Singh · Tom Zahavy -
2022 Poster: Evolving Curricula with Regret-Based Environment Design »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2022 Spotlight: Evolving Curricula with Regret-Based Environment Design »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2022 Poster: Model-Value Inconsistency as a Signal for Epistemic Uncertainty »
Angelos Filos · Eszter Vértes · Zita Marinho · Gregory Farquhar · Diana Borsa · Abe Friesen · Feryal Behbahani · Tom Schaul · Andre Barreto · Simon Osindero -
2022 Spotlight: Model-Value Inconsistency as a Signal for Epistemic Uncertainty »
Angelos Filos · Eszter Vértes · Zita Marinho · Gregory Farquhar · Diana Borsa · Abe Friesen · Feryal Behbahani · Tom Schaul · Andre Barreto · Simon Osindero -
2021 Workshop: ICML 2021 Workshop on Unsupervised Reinforcement Learning »
Feryal Behbahani · Joelle Pineau · Lerrel Pinto · Roberta Raileanu · Aravind Srinivas · Denis Yarats · Amy Zhang -
2021 Poster: Prioritized Level Replay »
Minqi Jiang · Edward Grefenstette · Tim Rocktäschel -
2021 Poster: From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization »
Julien Perolat · Remi Munos · Jean-Baptiste Lespiau · Shayegan Omidshafiei · Mark Rowland · Pedro Ortega · Neil Burch · Thomas Anthony · David Balduzzi · Bart De Vylder · Georgios Piliouras · Marc Lanctot · Karl Tuyls -
2021 Spotlight: Prioritized Level Replay »
Minqi Jiang · Edward Grefenstette · Tim Rocktäschel -
2021 Spotlight: From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization »
Julien Perolat · Remi Munos · Jean-Baptiste Lespiau · Shayegan Omidshafiei · Mark Rowland · Pedro Ortega · Neil Burch · Thomas Anthony · David Balduzzi · Bart De Vylder · Georgios Piliouras · Marc Lanctot · Karl Tuyls -
2021 Poster: Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers »
Luke Marris · Paul Muller · Marc Lanctot · Karl Tuyls · Thore Graepel -
2021 Spotlight: Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers »
Luke Marris · Paul Muller · Marc Lanctot · Karl Tuyls · Thore Graepel -
2020 : The NetHack Learning Environment Q&A »
Tim Rocktäschel · Katja Hofmann -
2020 : The NetHack Learning Environment »
Tim Rocktäschel -
2020 Workshop: 1st Workshop on Language in Reinforcement Learning (LaReL) »
Nantas Nardelli · Jelena Luketina · Nantas Nardelli · Jakob Foerster · Victor Zhong · Jacob Andreas · Tim Rocktäschel · Edward Grefenstette · Tim Rocktäschel -
2020 Poster: Fast computation of Nash Equilibria in Imperfect Information Games »
Remi Munos · Julien Perolat · Jean-Baptiste Lespiau · Mark Rowland · Bart De Vylder · Marc Lanctot · Finbarr Timbers · Daniel Hennes · Shayegan Omidshafiei · Audrunas Gruslys · Mohammad Gheshlaghi Azar · Edward Lockhart · Karl Tuyls -
2020 Poster: Learning Reasoning Strategies in End-to-End Differentiable Proving »
Pasquale Minervini · Sebastian Riedel · Pontus Stenetorp · Edward Grefenstette · Tim Rocktäschel -
2020 Poster: What Can Learned Intrinsic Rewards Capture? »
Zeyu Zheng · Junhyuk Oh · Matteo Hessel · Zhongwen Xu · Manuel Kroiss · Hado van Hasselt · David Silver · Satinder Singh -
2019 Poster: Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Francis Song · Edward Hughes · Neil Burch · Iain Dunning · Shimon Whiteson · Matthew Botvinick · Michael Bowling -
2019 Oral: Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Francis Song · Edward Hughes · Neil Burch · Iain Dunning · Shimon Whiteson · Matthew Botvinick · Michael Bowling -
2019 Poster: Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning »
Natasha Jaques · Angeliki Lazaridou · Edward Hughes · Caglar Gulcehre · Pedro Ortega · DJ Strouse · Joel Z Leibo · Nando de Freitas -
2019 Poster: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2019 Oral: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2019 Oral: Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning »
Natasha Jaques · Angeliki Lazaridou · Edward Hughes · Caglar Gulcehre · Pedro Ortega · DJ Strouse · Joel Z Leibo · Nando de Freitas -
2018 Poster: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2018 Oral: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2017 Poster: Programming with a Differentiable Forth Interpreter »
Matko Bošnjak · Tim Rocktäschel · Jason Naradowsky · Sebastian Riedel -
2017 Talk: Programming with a Differentiable Forth Interpreter »
Matko Bošnjak · Tim Rocktäschel · Jason Naradowsky · Sebastian Riedel