Timezone: »
Standard off-policy reinforcement learning (RL) methods based on temporal difference (TD) learning generally fail to learn good policies when applied to static offline datasets. Conventionally, this is attributed to distribution shift, where the Bellman backup queries high-value out-of-distribution (OOD) actions for the next time step, which then leads to systematic overestimation. However, this explanation is incomplete, as conservative offline RL methods that directly address overestimation still suffer from stability problems in practice. This suggests that although OOD actions may account for part of the challenge, the difficulties with TD learning in the offline setting are also deeply connected to other aspects such as the quality of representations of learned function approximators. In this work, we demonstrate that merely imposing pessimism is not sufficient for good performance, and demonstrate empirically that regularizing representations actually accounts for a large part of the improvement observed in modern offline RL methods. Building on this insight, we identify concrete metrics that enable effective diagnosis of the quality of the learned representation, and are able to adequately predict performance of the underlying method. Finally, we show that a simple approach for handling representations, without any changing any other aspect of conservative offline RL algorithms can lead to better performance in several offline RL problems.
Author Information
Xinyang Geng (UC Berkeley)
Kevin Li (UC Berkeley)
Abhishek Gupta (UC Berkeley)
Aviral Kumar (Indian Institute of Technology Bombay)
Final year undergraduate student at IIT Bombay, India. Interning at Google Brain Toronto. Will join UC Berkeley as a Ph.D. student starting Fall 2018.
Sergey Levine (University of Washington)
More from the Same Authors
-
2020 : DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction »
Aviral Kumar -
2021 : Reinforcement Learning as One Big Sequence Modeling Problem »
Michael Janner · Qiyang Li · Sergey Levine -
2021 : Intrinsic Control of Variational Beliefs in Dynamic Partially-Observed Visual Environments »
Nicholas Rhinehart · Jenny Wang · Glen Berseth · John Co-Reyes · Danijar Hafner · Chelsea Finn · Sergey Levine -
2021 : Explore and Control with Adversarial Surprise »
Arnaud Fickinger · Natasha Jaques · Samyak Parajuli · Michael Chang · Nicholas Rhinehart · Glen Berseth · Stuart Russell · Sergey Levine -
2021 : Reset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors without Human Intervention »
Abhishek Gupta · Justin Yu · Tony Z. Zhao · Vikash Kumar · Aaron Rovinsky · Kelvin Xu · Thomas Devlin · Sergey Levine -
2022 : DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning »
Quan Vuong · Aviral Kumar · Sergey Levine · Yevgen Chebotar -
2022 : Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Dibya Ghosh · Sergey Levine · Pulkit Agrawal · Abhishek Gupta -
2022 : You Only Live Once: Single-Life Reinforcement Learning via Learned Reward Shaping »
Annie Chen · Archit Sharma · Sergey Levine · Chelsea Finn -
2022 : Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Dibya Ghosh · Sergey Levine · Pulkit Agrawal · Abhishek Gupta -
2022 : Multimodal Masked Autoencoders Learn Transferable Representations »
Xinyang Geng · Hao Liu · Lisa Lee · Dale Schuurmans · Sergey Levine · Pieter Abbeel -
2022 : Multimodal Masked Autoencoders Learn Transferable Representations »
Xinyang Geng · Hao Liu · Lisa Lee · Dale Schuurmans · Sergey Levine · Pieter Abbeel -
2022 Poster: Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization »
Brandon Trabucco · Xinyang Geng · Aviral Kumar · Sergey Levine -
2022 Spotlight: Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization »
Brandon Trabucco · Xinyang Geng · Aviral Kumar · Sergey Levine -
2021 Poster: MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning »
Kevin Li · Abhishek Gupta · Ashwin D Reddy · Vitchyr Pong · Aurick Zhou · Justin Yu · Sergey Levine -
2021 Spotlight: MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning »
Kevin Li · Abhishek Gupta · Ashwin D Reddy · Vitchyr Pong · Aurick Zhou · Justin Yu · Sergey Levine -
2019 Workshop: Workshop on Multi-Task and Lifelong Reinforcement Learning »
Sarath Chandar · Shagun Sodhani · Khimya Khetarpal · Tom Zahavy · Daniel J. Mankowitz · Shie Mannor · Balaraman Ravindran · Doina Precup · Chelsea Finn · Abhishek Gupta · Amy Zhang · Kyunghyun Cho · Andrei A Rusu · Facebook Rob Fergus -
2018 Poster: Automatic Goal Generation for Reinforcement Learning Agents »
Carlos Florensa · David Held · Xinyang Geng · Pieter Abbeel -
2018 Poster: Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings »
John Co-Reyes · Yu Xuan Liu · Abhishek Gupta · Benjamin Eysenbach · Pieter Abbeel · Sergey Levine -
2018 Oral: Automatic Goal Generation for Reinforcement Learning Agents »
Carlos Florensa · David Held · Xinyang Geng · Pieter Abbeel -
2018 Oral: Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings »
John Co-Reyes · Yu Xuan Liu · Abhishek Gupta · Benjamin Eysenbach · Pieter Abbeel · Sergey Levine