Timezone: »
Off-policy reinforcement learning (RL) from pixel observations is notoriously unstable. As a result, many successful algorithms must combine different domain-specific practices and auxiliary losses to learn meaningful behaviors in complex environments. In this work, we provide novel analysis demonstrating that these instabilities arise from performing temporal-difference learning with a convolutional encoder and low-magnitude rewards. We show that this new visual deadly triad causes unstable training and premature convergence to degenerate solutions, a phenomenon we name catastrophic self-overfitting. Based on our analysis, we propose A-LIX, a method providing adaptive regularization to the encoder's gradients that explicitly prevents the occurrence of catastrophic self-overfitting using a dual objective. By applying A-LIX, we significantly outperform the prior state-of-the-art on the DeepMind Control and Atari benchmarks without any data augmentation or auxiliary losses.
Author Information
Edoardo Cetin (King's College London)
Ph.D. student at King’s College London. My goal is to advance the sample-efficiency, stability, and practicality of deep reinforcement learning for enabling autonomous agents to solve meaningful real-world problems. I strive to make contributions that can have a long-term impact, exploring the very fundamental aspects of modern deep learning.
Philip Ball (University of Oxford)
Stephen Roberts (University of Oxford)
Oya Celiktutan (King's College London)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Stabilizing Off-Policy Deep Reinforcement Learning from Pixels »
Tue. Jul 19th 06:00 -- 06:05 PM Room Room 309
More from the Same Authors
-
2021 : Revisiting Design Choices in Offline Model Based Reinforcement Learning »
Cong Lu · Philip Ball · Jack Parker-Holder · Michael A Osborne · Stephen Roberts -
2022 : Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations »
Cong Lu · Philip Ball · Tim G. J Rudner · Jack Parker-Holder · Michael A Osborne · Yee-Whye Teh -
2023 Poster: Efficient Online Reinforcement Learning with Offline Data »
Philip Ball · Laura Smith · Ilya Kostrikov · Sergey Levine -
2021 : Spotlight »
Zhiwei (Tony) Qin · Xianyuan Zhan · Meng Qi · Ruihan Yang · Philip Ball · Hamsa Bastani · Yao Liu · Xiuwen Wang · Haoran Xu · Tony Z. Zhao · Lili Chen · Aviral Kumar -
2021 : Invited Speakers' Panel »
Neeraja J Yadwadkar · Shalmali Joshi · Roberto Bondesan · Engineer Bainomugisha · Stephen Roberts -
2021 : Deployment and monitoring on constrained hardware and devices »
Cecilia Mascolo · Maria Nyamukuru · Ivan Kiskin · Partha Maji · Yunpeng Li · Stephen Roberts -
2021 Workshop: Challenges in Deploying and monitoring Machine Learning Systems »
Alessandra Tosi · Nathan Korda · Michael A Osborne · Stephen Roberts · Andrei Paleyes · Fariba Yousefi -
2021 : Opening remarks »
Alessandra Tosi · Nathan Korda · Fariba Yousefi · Andrei Paleyes · Stephen Roberts -
2021 Poster: Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment »
Philip Ball · Cong Lu · Jack Parker-Holder · Stephen Roberts -
2021 Spotlight: Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment »
Philip Ball · Cong Lu · Jack Parker-Holder · Stephen Roberts -
2021 Poster: Learning Routines for Effective Off-Policy Reinforcement Learning »
Edoardo Cetin · Oya Celiktutan -
2021 Spotlight: Learning Routines for Effective Off-Policy Reinforcement Learning »
Edoardo Cetin · Oya Celiktutan -
2020 : Contributed Talk 1: Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits »
Jack Parker-Holder · Vu Nguyen · Stephen Roberts -
2020 Poster: Ready Policy One: World Building Through Active Learning »
Philip Ball · Jack Parker-Holder · Aldo Pacchiano · Krzysztof Choromanski · Stephen Roberts -
2020 Poster: Bayesian Optimisation over Multiple Continuous and Categorical Inputs »
Binxin Ru · Ahsan Alvi · Vu Nguyen · Michael A Osborne · Stephen Roberts -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 Poster: Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation »
Ahsan Alvi · Binxin Ru · Jan-Peter Calliess · Stephen Roberts · Michael A Osborne -
2019 Oral: Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation »
Ahsan Alvi · Binxin Ru · Jan-Peter Calliess · Stephen Roberts · Michael A Osborne -
2018 Poster: Optimization, fast and slow: optimally switching between local and Bayesian optimization »
Mark McLeod · Stephen Roberts · Michael A Osborne -
2018 Oral: Optimization, fast and slow: optimally switching between local and Bayesian optimization »
Mark McLeod · Stephen Roberts · Michael A Osborne