Timezone: »
Reinforcement learning from large-scale offline datasets provides us with the ability to learn policies without potentially unsafe or impractical exploration. Significant progress has been made in the past few years in dealing with the challenge of correcting for differing behavior between the data collection and learned policies. However, little attention has been paid to potentially changing dynamics when transferring a policy to the online setting, where performance can be up to 90% reduced for existing methods. In this paper we address this problem with Augmented World Models (AugWM). We augment a learned dynamics model with simple transformations that seek to capture potential changes in physical properties of the robot, leading to more robust policies. We not only train our policy in this new setting, but also provide it with the sampled augmentation as a context, allowing it to adapt to changes in the environment. At test time we learn the context in a self-supervised fashion by approximating the augmentation which corresponds to the new environment. We rigorously evaluate our approach on over 100 different changed dynamics settings, and show that this simple approach can significantly improve the zero-shot generalization of a recent state-of-the-art baseline, often achieving successful policies where the baseline fails.
Author Information
Philip Ball (University of Oxford)
Cong Lu (University of Oxford)
Jack Parker-Holder (University of Oxford)
Stephen Roberts (University of Oxford)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment »
Wed. Jul 21st 01:40 -- 01:45 AM Room
More from the Same Authors
-
2021 : Meta Learning MDPs with linear transition models »
Robert Müller · Aldo Pacchiano · Jack Parker-Holder -
2021 : Revisiting Design Choices in Offline Model Based Reinforcement Learning »
Cong Lu · Philip Ball · Jack Parker-Holder · Michael A Osborne · Stephen Roberts -
2022 : Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations »
Cong Lu · Philip Ball · Tim G. J Rudner · Jack Parker-Holder · Michael A Osborne · Yee-Whye Teh -
2023 : The phases of large learning rate gradient descent through effective parameters »
Lawrence Wang · Stephen Roberts -
2023 : Synthetic Experience Replay »
Cong Lu · Philip Ball · Yee-Whye Teh · Jack Parker-Holder -
2023 Poster: Efficient Online Reinforcement Learning with Offline Data »
Philip Ball · Laura Smith · Ilya Kostrikov · Sergey Levine -
2022 Poster: Evolving Curricula with Regret-Based Environment Design »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2022 Spotlight: Evolving Curricula with Regret-Based Environment Design »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2022 Poster: Stabilizing Off-Policy Deep Reinforcement Learning from Pixels »
Edoardo Cetin · Philip Ball · Stephen Roberts · Oya Celiktutan -
2022 Spotlight: Stabilizing Off-Policy Deep Reinforcement Learning from Pixels »
Edoardo Cetin · Philip Ball · Stephen Roberts · Oya Celiktutan -
2021 : Spotlight »
Zhiwei (Tony) Qin · Xianyuan Zhan · Meng Qi · Ruihan Yang · Philip Ball · Hamsa Bastani · Yao Liu · Xiuwen Wang · Haoran Xu · Tony Z. Zhao · Lili Chen · Aviral Kumar -
2021 : Invited Speakers' Panel »
Neeraja J Yadwadkar · Shalmali Joshi · Roberto Bondesan · Engineer Bainomugisha · Stephen Roberts -
2021 : Deployment and monitoring on constrained hardware and devices »
Cecilia Mascolo · Maria Nyamukuru · Ivan Kiskin · Partha Maji · Yunpeng Li · Stephen Roberts -
2021 Workshop: Challenges in Deploying and monitoring Machine Learning Systems »
Alessandra Tosi · Nathan Korda · Michael A Osborne · Stephen Roberts · Andrei Paleyes · Fariba Yousefi -
2021 : Opening remarks »
Alessandra Tosi · Nathan Korda · Fariba Yousefi · Andrei Paleyes · Stephen Roberts -
2021 Poster: Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning »
Luisa Zintgraf · Leo Feng · Cong Lu · Maximilian Igl · Kristian Hartikainen · Katja Hofmann · Shimon Whiteson -
2021 Spotlight: Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning »
Luisa Zintgraf · Leo Feng · Cong Lu · Maximilian Igl · Kristian Hartikainen · Katja Hofmann · Shimon Whiteson -
2021 Poster: Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces »
Xingchen Wan · Vu Nguyen · Huong Ha · Binxin Ru · Cong Lu · Michael A Osborne -
2021 Spotlight: Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces »
Xingchen Wan · Vu Nguyen · Huong Ha · Binxin Ru · Cong Lu · Michael A Osborne -
2020 : Panel Discussion »
Neil Lawrence · Mihaela van der Schaar · Alex Smola · Valerio Perrone · Jack Parker-Holder · Zhengying Liu -
2020 : Contributed Talk 1: Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits »
Jack Parker-Holder · Vu Nguyen · Stephen Roberts -
2020 : Spotlight talk 2 - Ridge Riding: Finding diverse solutions by following eigenvectors of the Hessian »
Jack Parker-Holder -
2020 Poster: Stochastic Flows and Geometric Optimization on the Orthogonal Group »
Krzysztof Choromanski · David Cheikhi · Jared Quincy Davis · Valerii Likhosherstov · Achille Nazaret · Achraf Bahamou · Xingyou Song · Mrugank Akarte · Jack Parker-Holder · Jacob Bergquist · Yuan Gao · Aldo Pacchiano · Tamas Sarlos · Adrian Weller · Vikas Sindhwani -
2020 Poster: Learning to Score Behaviors for Guided Policy Optimization »
Aldo Pacchiano · Jack Parker-Holder · Yunhao Tang · Krzysztof Choromanski · Anna Choromanska · Michael Jordan -
2020 Poster: Ready Policy One: World Building Through Active Learning »
Philip Ball · Jack Parker-Holder · Aldo Pacchiano · Krzysztof Choromanski · Stephen Roberts -
2020 Poster: Bayesian Optimisation over Multiple Continuous and Categorical Inputs »
Binxin Ru · Ahsan Alvi · Vu Nguyen · Michael A Osborne · Stephen Roberts -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 Poster: Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation »
Ahsan Alvi · Binxin Ru · Jan-Peter Calliess · Stephen Roberts · Michael A Osborne -
2019 Oral: Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation »
Ahsan Alvi · Binxin Ru · Jan-Peter Calliess · Stephen Roberts · Michael A Osborne -
2018 Poster: Optimization, fast and slow: optimally switching between local and Bayesian optimization »
Mark McLeod · Stephen Roberts · Michael A Osborne -
2018 Oral: Optimization, fast and slow: optimally switching between local and Bayesian optimization »
Mark McLeod · Stephen Roberts · Michael A Osborne