Timezone: »
The last decade has been revolutionary for reinforcement learning (RL) — it can now solve complex decision and control problems. Successful RL methods were handcrafted using mathematical derivations, intuition, and experimentation. This approach has a major shortcoming—it results in specific solutions to the RL problem, rather than a protocol for discovering efficient and robust methods. In contrast, the emerging field of meta-learning provides a toolkit for automatic machine learning method optimisation, potentially addressing this flaw. However, black-box approaches which attempt to discover RL algorithms with minimal prior structure have thus far not been successful. Mirror Learning, which includes RL algorithms, such as PPO, offers a potential framework. In this paper we explore the Mirror Learning space by meta-learning a “drift” function. We refer to the result as Learnt Policy Optimisation (LPO). By analysing LPO we gain original insights into policy optimisation which we use to formulate a novel, closed-form RL algorithm, Discovered Policy Optimisation (DPO). Our experiments in Brax environments confirm state-of-the-art performance of LPO and DPO, as well as their transfer to unseen settings.
Author Information
Christopher Lu (University of Oxford)
Jakub Grudzien Kuba (University of Oxford)
Alistair Letcher (None)
Luke Metz (Google Brain)
Christian Schroeder (University of Oxford)
Jakob Foerster (Oxford university)
Jakob Foerster started as an Associate Professor at the department of engineering science at the University of Oxford in the fall of 2021. During his PhD at Oxford he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. After his PhD he worked as a research scientist at Facebook AI Research in California, where he continued doing foundational work. He was the lead organizer of the first Emergent Communication workshop at NeurIPS in 2017, which he has helped organize ever since and was awarded a prestigious CIFAR AI chair in 2019. His past work addresses how AI agents can learn to cooperate and communicate with other agents, most recently he has been developing and addressing the zero-shot coordination problem setting, a crucial step towards human-AI coordination.
More from the Same Authors
-
2022 : Adversarial Cheap Talk »
Christopher Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 : Illusionary Attacks on Sequential Decision Makers and Countermeasures »
Tim Franzmeyer · Joao Henriques · Jakob Foerster · Phil Torr · Adel Bibi · Christian Schroeder -
2022 : Adversarial Cheap Talk »
Christopher Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 : Adversarial Cheap Talk »
Christopher Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 : Adversarial Cheap Talk »
Christopher Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2023 : Illusory Attacks: Detectability Matters in Adversarial Attacks on Sequential Decision-Makers »
Tim Franzmeyer · Stephen Mcaleer · Joao Henriques · Jakob Foerster · Phil Torr · Adel Bibi · Christian Schroeder -
2023 : Analyzing the Sample Complexity of Model-Free Opponent Shaping »
Kitty Fung · Qizhen Zhang · Christopher Lu · Timon Willi · Jakob Foerster -
2023 : Structured State Space Models for In-Context Reinforcement Learning »
Christopher Lu · Yannick Schroecker · Albert Gu · Emilio Parisotto · Jakob Foerster · Satinder Singh · Feryal Behbahani -
2023 : Who to imitate: Imitating desired behavior from diverse multi-agent datasets »
Tim Franzmeyer · Jakob Foerster · Edith Elkind · Phil Torr · Joao Henriques -
2023 Poster: Learning Intuitive Policies Using Action Features »
Mingwei Ma · Jizhou Liu · Samuel Sokota · Max Kleiman-Weiner · Jakob Foerster -
2023 Poster: Adversarial Cheap Talk »
Christopher Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 : Adversarial Cheap Talk »
Christopher Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 Poster: Evolving Curricula with Regret-Based Environment Design »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2022 Poster: COLA: Consistent Learning with Opponent-Learning Awareness »
Timon Willi · Alistair Letcher · Johannes Treutlein · Jakob Foerster -
2022 Spotlight: Evolving Curricula with Regret-Based Environment Design »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2022 Spotlight: COLA: Consistent Learning with Opponent-Learning Awareness »
Timon Willi · Alistair Letcher · Johannes Treutlein · Jakob Foerster -
2022 Poster: Communicating via Markov Decision Processes »
Samuel Sokota · Christian Schroeder · Maximilian Igl · Luisa Zintgraf · Phil Torr · Martin Strohmeier · Zico Kolter · Shimon Whiteson · Jakob Foerster -
2022 Spotlight: Communicating via Markov Decision Processes »
Samuel Sokota · Christian Schroeder · Maximilian Igl · Luisa Zintgraf · Phil Torr · Martin Strohmeier · Zico Kolter · Shimon Whiteson · Jakob Foerster -
2022 Poster: Model-Free Opponent Shaping »
Christopher Lu · Timon Willi · Christian Schroeder de Witt · Jakob Foerster -
2022 Poster: Mirror Learning: A Unifying Framework of Policy Optimisation »
Jakub Grudzien Kuba · Christian Schroeder de Witt · Jakob Foerster -
2022 Poster: Generalized Beliefs for Cooperative AI »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster -
2022 Spotlight: Generalized Beliefs for Cooperative AI »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster -
2022 Spotlight: Model-Free Opponent Shaping »
Christopher Lu · Timon Willi · Christian Schroeder de Witt · Jakob Foerster -
2022 Spotlight: Mirror Learning: A Unifying Framework of Policy Optimisation »
Jakub Grudzien Kuba · Christian Schroeder de Witt · Jakob Foerster -
2021 Poster: Learn2Hop: Learned Optimization on Rough Landscapes »
Amil Merchant · Luke Metz · Samuel Schoenholz · Ekin Dogus Cubuk -
2021 Spotlight: Learn2Hop: Learned Optimization on Rough Landscapes »
Amil Merchant · Luke Metz · Samuel Schoenholz · Ekin Dogus Cubuk -
2021 Poster: On Linear Identifiability of Learned Representations »
Geoffrey Roeder · Luke Metz · Durk Kingma -
2021 Spotlight: On Linear Identifiability of Learned Representations »
Geoffrey Roeder · Luke Metz · Durk Kingma -
2021 Poster: Off-Belief Learning »
Hengyuan Hu · Adam Lerer · Brandon Cui · Luis Pineda · Noam Brown · Jakob Foerster -
2021 Spotlight: Off-Belief Learning »
Hengyuan Hu · Adam Lerer · Brandon Cui · Luis Pineda · Noam Brown · Jakob Foerster -
2021 Poster: Trajectory Diversity for Zero-Shot Coordination »
Andrei Lupu · Brandon Cui · Hengyuan Hu · Jakob Foerster -
2021 Poster: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies »
Paul Vicol · Luke Metz · Jascha Sohl-Dickstein -
2021 Oral: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies »
Paul Vicol · Luke Metz · Jascha Sohl-Dickstein -
2021 Spotlight: Trajectory Diversity for Zero-Shot Coordination »
Andrei Lupu · Brandon Cui · Hengyuan Hu · Jakob Foerster -
2021 Poster: A New Formalism, Method and Open Issues for Zero-Shot Coordination »
Johannes Treutlein · Michael Dennis · Caspar Oesterheld · Jakob Foerster -
2021 Spotlight: A New Formalism, Method and Open Issues for Zero-Shot Coordination »
Johannes Treutlein · Michael Dennis · Caspar Oesterheld · Jakob Foerster -
2020 Poster: “Other-Play” for Zero-Shot Coordination »
Hengyuan Hu · Alexander Peysakhovich · Adam Lerer · Jakob Foerster -
2019 : Spotlight »
Tyler Scott · Kiran Thekumparampil · Jonathan Aigrain · Rene Bidart · Priyadarshini Panda · Dian Ang Yap · Yaniv Yacoby · Raphael Gontijo Lopes · Alberto Marchisio · Erik Englesson · Wanqian Yang · Moritz Graule · Yi Sun · Daniel Kang · Mike Dusenberry · Min Du · Hartmut Maennel · Kunal Menda · Vineet Edupuganti · Luke Metz · David Stutz · Vignesh Srinivasan · Timo Sämann · Vineeth N Balasubramanian · Sina Mohseni · Rob Cornish · Judith Butepage · Zhangyang Wang · Bai Li · Bo Han · Honglin Li · Maksym Andriushchenko · Lukas Ruff · Meet P. Vadera · Yaniv Ovadia · Sunil Thulasidasan · Disi Ji · Gang Niu · Saeed Mahloujifar · Aviral Kumar · SANGHYUK CHUN · Dong Yin · Joyce Xu Xu · Hugo Gomes · Raanan Rohekar -
2019 Poster: Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Francis Song · Edward Hughes · Neil Burch · Iain Dunning · Shimon Whiteson · Matthew Botvinick · Michael Bowling -
2019 Poster: Understanding and correcting pathologies in the training of learned optimizers »
Luke Metz · Niru Maheswaranathan · Jeremy Nixon · Daniel Freeman · Jascha Sohl-Dickstein -
2019 Poster: Guided evolutionary strategies: augmenting random search with surrogate gradients »
Niru Maheswaranathan · Luke Metz · George Tucker · Dami Choi · Jascha Sohl-Dickstein -
2019 Oral: Guided evolutionary strategies: augmenting random search with surrogate gradients »
Niru Maheswaranathan · Luke Metz · George Tucker · Dami Choi · Jascha Sohl-Dickstein -
2019 Oral: Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Francis Song · Edward Hughes · Neil Burch · Iain Dunning · Shimon Whiteson · Matthew Botvinick · Michael Bowling -
2019 Oral: Understanding and correcting pathologies in the training of learned optimizers »
Luke Metz · Niru Maheswaranathan · Jeremy Nixon · Daniel Freeman · Jascha Sohl-Dickstein -
2019 Poster: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2019 Oral: A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs »
Jingkai Mao · Jakob Foerster · Tim Rocktäschel · Maruan Al-Shedivat · Gregory Farquhar · Shimon Whiteson -
2018 Poster: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2018 Poster: QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning »
Tabish Rashid · Mikayel Samvelyan · Christian Schroeder · Gregory Farquhar · Jakob Foerster · Shimon Whiteson -
2018 Oral: The Mechanics of n-Player Differentiable Games »
David Balduzzi · Sebastien Racaniere · James Martens · Jakob Foerster · Karl Tuyls · Thore Graepel -
2018 Oral: QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning »
Tabish Rashid · Mikayel Samvelyan · Christian Schroeder · Gregory Farquhar · Jakob Foerster · Shimon Whiteson -
2018 Poster: DiCE: The Infinitely Differentiable Monte Carlo Estimator »
Jakob Foerster · Gregory Farquhar · Maruan Al-Shedivat · Tim Rocktäschel · Eric Xing · Shimon Whiteson -
2018 Oral: DiCE: The Infinitely Differentiable Monte Carlo Estimator »
Jakob Foerster · Gregory Farquhar · Maruan Al-Shedivat · Tim Rocktäschel · Eric Xing · Shimon Whiteson -
2017 Poster: Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Nantas Nardelli · Gregory Farquhar · Triantafyllos Afouras · Phil Torr · Pushmeet Kohli · Shimon Whiteson -
2017 Talk: Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Nantas Nardelli · Gregory Farquhar · Triantafyllos Afouras · Phil Torr · Pushmeet Kohli · Shimon Whiteson -
2017 Poster: Input Switched Affine Networks: An RNN Architecture Designed for Interpretability »
Jakob Foerster · Justin Gilmer · Jan Chorowski · Jascha Sohl-Dickstein · David Sussillo -
2017 Talk: Input Switched Affine Networks: An RNN Architecture Designed for Interpretability »
Jakob Foerster · Justin Gilmer · Jan Chorowski · Jascha Sohl-Dickstein · David Sussillo