Timezone: »
Offline reinforcement learning enables agents to make use of large pre-collected datasets of environment transitions and learn control policies without the need for potentially expensive or unsafe online data collection. Recently, significant progress has been made in offline RL, with the dominant approach becoming methods which leverage a learned dynamics model. This typically involves constructing a probabilistic model, and using it to penalize rewards in regions of high uncertainty, solving for a pessimistic MDP that lower bounds the true MDP. Recent work, however, exhibits a breakdown between theory and practice, whereby pessimistic return ought to be bounded by the total variation distance of the model from the true dynamics, but is instead implemented through a penalty based on estimated model uncertainty. This has spawned a variety of uncertainty heuristics, with little to no comparison between differing approaches. In this paper, we show these heuristics have significant interactions with other design choices, such as the number of models in the ensemble, the model rollout length and the penalty weight. Furthermore, we compare these uncertainty heuristics under a new evaluation protocol that, for the first time, captures the specific covariate shift induced by model-based RL. This allows us to accurately assess the calibration of different proposed penalties. Finally, with these insights, we show that selecting these key hyperparameters using Bayesian Optimization produces drastically stronger performance than existing hand-tuned methods.
Author Information
Cong Lu (University of Oxford)
Philip Ball (University of Oxford)
Jack Parker-Holder (University of Oxford)
Michael A Osborne (U Oxford)
Stephen Roberts (University of Oxford)
More from the Same Authors
-
2021 : Attacking Graph Classification via Bayesian Optimisation »
Xingchen Wan · Henry Kenlay · Binxin Ru · Arno Blaas · Michael A Osborne · Xiaowen Dong -
2022 : Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations »
Cong Lu · Philip Ball · Tim G. J Rudner · Jack Parker-Holder · Michael A Osborne · Yee-Whye Teh -
2023 : The phases of large learning rate gradient descent through effective parameters »
Lawrence Wang · Stephen Roberts -
2023 : Synthetic Experience Replay »
Cong Lu · Philip Ball · Yee-Whye Teh · Jack Parker-Holder -
2023 : SOBER: Highly Parallel Bayesian Optimization and Bayesian Quadrature over Discrete and Mixed Spaces »
Masaki Adachi · Satoshi Hayakawa · Saad Hamid · Martin Jørgensen · Harald Oberhauser · Michael A Osborne -
2023 Poster: Efficient Online Reinforcement Learning with Offline Data »
Philip Ball · Laura Smith · Ilya Kostrikov · Sergey Levine -
2022 Poster: From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers »
Krzysztof Choromanski · Han Lin · Haoxian Chen · Tianyi Zhang · Arijit Sehanobish · Valerii Likhosherstov · Jack Parker-Holder · Tamas Sarlos · Adrian Weller · Thomas Weingarten -
2022 Spotlight: From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers »
Krzysztof Choromanski · Han Lin · Haoxian Chen · Tianyi Zhang · Arijit Sehanobish · Valerii Likhosherstov · Jack Parker-Holder · Tamas Sarlos · Adrian Weller · Thomas Weingarten -
2022 Poster: Robust Multi-Objective Bayesian Optimization Under Input Noise »
Samuel Daulton · Sait Cakmak · Maximilian Balandat · Michael A Osborne · Enlu Zhou · Eytan Bakshy -
2022 Spotlight: Robust Multi-Objective Bayesian Optimization Under Input Noise »
Samuel Daulton · Sait Cakmak · Maximilian Balandat · Michael A Osborne · Enlu Zhou · Eytan Bakshy -
2022 Poster: Stabilizing Off-Policy Deep Reinforcement Learning from Pixels »
Edoardo Cetin · Philip Ball · Stephen Roberts · Oya Celiktutan -
2022 Spotlight: Stabilizing Off-Policy Deep Reinforcement Learning from Pixels »
Edoardo Cetin · Philip Ball · Stephen Roberts · Oya Celiktutan -
2021 : Spotlight »
Zhiwei (Tony) Qin · Xianyuan Zhan · Meng Qi · Ruihan Yang · Philip Ball · Hamsa Bastani · Yao Liu · Xiuwen Wang · Haoran Xu · Tony Z. Zhao · Lili Chen · Aviral Kumar -
2021 : Invited Speakers' Panel »
Neeraja J Yadwadkar · Shalmali Joshi · Roberto Bondesan · Engineer Bainomugisha · Stephen Roberts -
2021 : Deployment and monitoring on constrained hardware and devices »
Cecilia Mascolo · Maria Nyamukuru · Ivan Kiskin · Partha Maji · Yunpeng Li · Stephen Roberts -
2021 Workshop: Challenges in Deploying and monitoring Machine Learning Systems »
Alessandra Tosi · Nathan Korda · Michael A Osborne · Stephen Roberts · Andrei Paleyes · Fariba Yousefi -
2021 : Opening remarks »
Alessandra Tosi · Nathan Korda · Fariba Yousefi · Andrei Paleyes · Stephen Roberts -
2021 Poster: Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning »
Luisa Zintgraf · Leo Feng · Cong Lu · Maximilian Igl · Kristian Hartikainen · Katja Hofmann · Shimon Whiteson -
2021 Spotlight: Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning »
Luisa Zintgraf · Leo Feng · Cong Lu · Maximilian Igl · Kristian Hartikainen · Katja Hofmann · Shimon Whiteson -
2021 Poster: Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces »
Xingchen Wan · Vu Nguyen · Huong Ha · Binxin Ru · Cong Lu · Michael A Osborne -
2021 Poster: Optimal Transport Kernels for Sequential and Parallel Neural Architecture Search »
Vu Nguyen · Tam Le · Makoto Yamada · Michael A Osborne -
2021 Poster: Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment »
Philip Ball · Cong Lu · Jack Parker-Holder · Stephen Roberts -
2021 Spotlight: Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces »
Xingchen Wan · Vu Nguyen · Huong Ha · Binxin Ru · Cong Lu · Michael A Osborne -
2021 Spotlight: Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment »
Philip Ball · Cong Lu · Jack Parker-Holder · Stephen Roberts -
2021 Spotlight: Optimal Transport Kernels for Sequential and Parallel Neural Architecture Search »
Vu Nguyen · Tam Le · Makoto Yamada · Michael A Osborne -
2020 : Contributed Talk 1: Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits »
Jack Parker-Holder · Vu Nguyen · Stephen Roberts -
2020 Poster: Knowing The What But Not The Where in Bayesian Optimization »
Vu Nguyen · Michael A Osborne -
2020 Poster: Ready Policy One: World Building Through Active Learning »
Philip Ball · Jack Parker-Holder · Aldo Pacchiano · Krzysztof Choromanski · Stephen Roberts -
2020 Poster: Bayesian Optimisation over Multiple Continuous and Categorical Inputs »
Binxin Ru · Ahsan Alvi · Vu Nguyen · Michael A Osborne · Stephen Roberts -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 Poster: On the Limitations of Representing Functions on Sets »
Edward Wagstaff · Fabian Fuchs · Martin Engelcke · Ingmar Posner · Michael A Osborne -
2019 Oral: On the Limitations of Representing Functions on Sets »
Edward Wagstaff · Fabian Fuchs · Martin Engelcke · Ingmar Posner · Michael A Osborne -
2019 Poster: Automated Model Selection with Bayesian Quadrature »
Henry Chai · Jean-Francois Ton · Michael A Osborne · Roman Garnett -
2019 Poster: AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs »
Gabriele Abbati · Philippe Wenk · Michael A Osborne · Andreas Krause · Bernhard Schölkopf · Stefan Bauer -
2019 Poster: Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation »
Ahsan Alvi · Binxin Ru · Jan-Peter Calliess · Stephen Roberts · Michael A Osborne -
2019 Oral: Automated Model Selection with Bayesian Quadrature »
Henry Chai · Jean-Francois Ton · Michael A Osborne · Roman Garnett -
2019 Oral: AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs »
Gabriele Abbati · Philippe Wenk · Michael A Osborne · Andreas Krause · Bernhard Schölkopf · Stefan Bauer -
2019 Oral: Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation »
Ahsan Alvi · Binxin Ru · Jan-Peter Calliess · Stephen Roberts · Michael A Osborne -
2019 Poster: Fingerprint Policy Optimisation for Robust Reinforcement Learning »
Supratik Paul · Michael A Osborne · Shimon Whiteson -
2019 Oral: Fingerprint Policy Optimisation for Robust Reinforcement Learning »
Supratik Paul · Michael A Osborne · Shimon Whiteson -
2018 Poster: Fast Information-theoretic Bayesian Optimisation »
Binxin Ru · Michael A Osborne · Mark Mcleod · Diego Granziol -
2018 Poster: Optimization, fast and slow: optimally switching between local and Bayesian optimization »
Mark McLeod · Stephen Roberts · Michael A Osborne -
2018 Oral: Optimization, fast and slow: optimally switching between local and Bayesian optimization »
Mark McLeod · Stephen Roberts · Michael A Osborne -
2018 Oral: Fast Information-theoretic Bayesian Optimisation »
Binxin Ru · Michael A Osborne · Mark Mcleod · Diego Granziol