Timezone: »
In offline RL, constraining the learned policy to remain close to the data is essential to prevent the policy from outputting out-of-distribution (OOD) actions with erroneously overestimated values. In principle, generative adversarial networks (GAN) can provide an elegant solution to do so, with the discriminator directly providing a probability that quantifies distributional shift. However, in practice, GAN-based offline RL methods have not outperformed alternative approaches, perhaps because the generator is trained to both fool the discriminator and maximize return - two objectives that are often at odds with each other. In this paper, we show that the issue of conflicting objectives can be resolved by training two generators: one that maximizes return, with the other capturing the "remainder" of the data distribution in the offline dataset, such that the mixture of the two is close to the behavior policy. We show that not only does having two generators enable an effective GAN-based offline RL method, but also approximates a support constraint, where the policy does not need to match the entire data distribution, but only the slice of the data that leads to high long term performance. We name our method DASCO, for Dual-Generator Adversarial Support Constrained Offline RL. On benchmark tasks that require learning from sub-optimal data, DASCO significantly outperforms prior methods that enforce distribution constraint.
Author Information
Quan Vuong (University of California San Diego)
Aviral Kumar (Indian Institute of Technology Bombay)
Final year undergraduate student at IIT Bombay, India. Interning at Google Brain Toronto. Will join UC Berkeley as a Ph.D. student starting Fall 2018.
Sergey Levine (University of Washington)
Yevgen Chebotar (Google)
More from the Same Authors
-
2020 : DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction »
Aviral Kumar -
2021 : Multi-Task Offline Reinforcement Learning with Conservative Data Sharing »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Sergey Levine · Chelsea Finn -
2021 : Reinforcement Learning as One Big Sequence Modeling Problem »
Michael Janner · Qiyang Li · Sergey Levine -
2021 : Intrinsic Control of Variational Beliefs in Dynamic Partially-Observed Visual Environments »
Nicholas Rhinehart · Jenny Wang · Glen Berseth · John Co-Reyes · Danijar Hafner · Chelsea Finn · Sergey Levine -
2021 : Explore and Control with Adversarial Surprise »
Arnaud Fickinger · Natasha Jaques · Samyak Parajuli · Michael Chang · Nicholas Rhinehart · Glen Berseth · Stuart Russell · Sergey Levine -
2021 : Multi-Task Offline Reinforcement Learning with Conservative Data Sharing »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Sergey Levine · Chelsea Finn -
2022 : Effective Offline RL Needs Going Beyond Pessimism: Representations and Distributional Shift »
Xinyang Geng · Kevin Li · Abhishek Gupta · Aviral Kumar · Sergey Levine -
2022 : Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Dibya Ghosh · Sergey Levine · Pulkit Agrawal · Abhishek Gupta -
2022 : You Only Live Once: Single-Life Reinforcement Learning via Learned Reward Shaping »
Annie Chen · Archit Sharma · Sergey Levine · Chelsea Finn -
2022 : Multimodal Masked Autoencoders Learn Transferable Representations »
Xinyang Geng · Hao Liu · Lisa Lee · Dale Schuurmans · Sergey Levine · Pieter Abbeel -
2023 Poster: Reinforcement Learning from Passive Data via Latent Intentions »
Dibya Ghosh · Chethan Bhateja · Sergey Levine -
2023 Poster: Jump-Start Reinforcement Learning »
Ikechukwu Uchendu · Ted Xiao · Yao Lu · Banghua Zhu · Mengyuan Yan · Joséphine Simon · Matthew Bennice · Chuyuan Fu · Cong Ma · Jiantao Jiao · Sergey Levine · Karol Hausman -
2023 Poster: PaLM-E: An Embodied Multimodal Language Model »
Danny Driess · Pete Florence · Klaus Greff · Marc Toussaint · Igor Mordatch · Andy Zeng · Vincent Vanhoucke · Mehdi S. M. Sajjadi · Corey Lynch · Ayzaan Wahid · brian ichter · Fei Xia · Pierre Sermanet · Yevgen Chebotar · Jonathan Tompson · Wenlong Huang · Sergey Levine · Tianhe (Kevin) Yu · Karol Hausman · Quan Vuong · Aakanksha Chowdhery · Daniel Duckworth -
2023 Poster: Efficient Online Reinforcement Learning with Offline Data »
Philip Ball · Laura Smith · Ilya Kostrikov · Sergey Levine -
2023 Poster: Adversarial Policies Beat Superhuman Go AIs »
Tony Wang · Adam Gleave · Tom Tseng · Nora Belrose · Kellin Pelrine · Joseph Miller · Michael Dennis · Yawen Duan · Viktor Pogrebniak · Sergey Levine · Stuart Russell -
2023 Poster: Understanding the Complexity Gains of Single-Task RL with a Curriculum »
Qiyang Li · Simon Zhai · Yi Ma · Sergey Levine -
2023 Poster: Predictable MDP Abstraction for Unsupervised Model-Based RL »
Seohong Park · Sergey Levine -
2023 Poster: A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning »
Benjamin Eysenbach · Matthieu Geist · Ruslan Salakhutdinov · Sergey Levine -
2023 Oral: Reinforcement Learning from Passive Data via Latent Intentions »
Dibya Ghosh · Chethan Bhateja · Sergey Levine -
2023 Oral: Adversarial Policies Beat Superhuman Go AIs »
Tony Wang · Adam Gleave · Tom Tseng · Nora Belrose · Kellin Pelrine · Joseph Miller · Michael Dennis · Yawen Duan · Viktor Pogrebniak · Sergey Levine · Stuart Russell -
2022 : Multimodal Masked Autoencoders Learn Transferable Representations »
Xinyang Geng · Hao Liu · Lisa Lee · Dale Schuurmans · Sergey Levine · Pieter Abbeel -
2022 Poster: How to Leverage Unlabeled Data in Offline Reinforcement Learning »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Chelsea Finn · Sergey Levine -
2022 Spotlight: How to Leverage Unlabeled Data in Offline Reinforcement Learning »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Chelsea Finn · Sergey Levine -
2021 Poster: Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills »
Yevgen Chebotar · Karol Hausman · Yao Lu · Ted Xiao · Dmitry Kalashnikov · Jacob Varley · Alexander Irpan · Benjamin Eysenbach · Ryan C Julian · Chelsea Finn · Sergey Levine -
2021 Spotlight: Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills »
Yevgen Chebotar · Karol Hausman · Yao Lu · Ted Xiao · Dmitry Kalashnikov · Jacob Varley · Alexander Irpan · Benjamin Eysenbach · Ryan C Julian · Chelsea Finn · Sergey Levine -
2020 Poster: Striving for Simplicity and Performance in Off-Policy DRL: Output Normalization and Non-Uniform Sampling »
Che Wang · Yanqiu Wu · Quan Vuong · Keith Ross