Timezone: »
We consider the offline reinforcement learning (RL) setting where the agent aims to optimize the policy solely from the data without further environment interactions. In offline RL, the distributional shift becomes the primary source of difficulty, which arises from the deviation of the target policy being optimized from the behavior policy used for data collection. This typically causes overestimation of action values, which poses severe problems for model-free algorithms that use bootstrapping. To mitigate the problem, prior offline RL algorithms often used sophisticated techniques that encourage underestimation of action values, which introduces an additional set of hyperparameters that need to be tuned properly. In this paper, we present an offline RL algorithm that prevents overestimation in a more principled way. Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy and does not rely on policy-gradients, unlike previous offline RL algorithms. Using an extensive set of benchmark datasets for offline RL, we show that OptiDICE performs competitively with the state-of-the-art methods.
Author Information
Jongmin Lee (KAIST)
Wonseok Jeon (MILA, McGill University)
Byung-Jun Lee (Gauss Labs Inc.)
Joelle Pineau (McGill University / Facebook)
Kee-Eung Kim (KAIST)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation »
Wed. Jul 21st 04:00 -- 06:00 AM Room Virtual
More from the Same Authors
-
2023 : Fostering Women's Leadership in the Realm of Emerging Trends and Technologies »
Joelle Pineau · Rihab Gorsane · Pascale FUNG -
2023 : Joelle Pineau - A culture of open and reproducible research, in the era of large AI generative models »
Joelle Pineau -
2023 Panel: The Societal Impacts of AI »
Sanmi Koyejo · Samy Bengio · Ashia Wilson · Kirikowhai Mikaere · Joelle Pineau -
2023 Poster: Information-Theoretic State Space Model for Multi-View Reinforcement Learning »
HyeongJoo Hwang · Seokin Seo · Youngsoo Jang · Sungyoon Kim · Geon-Hyeong Kim · Seunghoon Hong · Kee-Eung Kim -
2023 Oral: Information-Theoretic State Space Model for Multi-View Reinforcement Learning »
HyeongJoo Hwang · Seokin Seo · Youngsoo Jang · Sungyoon Kim · Geon-Hyeong Kim · Seunghoon Hong · Kee-Eung Kim -
2022 Poster: PAC-Net: A Model Pruning Approach to Inductive Transfer Learning »
Sanghoon Myung · In Huh · Wonik Jang · Jae Myung Choe · Jisu Ryu · Daesin Kim · Kee-Eung Kim · Changwook Jeong -
2022 Spotlight: PAC-Net: A Model Pruning Approach to Inductive Transfer Learning »
Sanghoon Myung · In Huh · Wonik Jang · Jae Myung Choe · Jisu Ryu · Daesin Kim · Kee-Eung Kim · Changwook Jeong -
2021 Workshop: ICML 2021 Workshop on Unsupervised Reinforcement Learning »
Feryal Behbahani · Joelle Pineau · Lerrel Pinto · Roberta Raileanu · Aravind Srinivas · Denis Yarats · Amy Zhang -
2020 Workshop: MLRetrospectives: A Venue for Self-Reflection in ML Research »
Jessica Forde · Jesse Dodge · Mayoore Jaiswal · Rosanne Liu · Ryan Lowe · Rosanne Liu · Joelle Pineau · Yoshua Bengio -
2020 Poster: Online Learned Continual Compression with Adaptive Quantization Modules »
Lucas Caccia · Eugene Belilovsky · Massimo Caccia · Joelle Pineau -
2020 Poster: Variational Inference for Sequential Data with Future Likelihood Estimates »
Geon-Hyeong Kim · Youngsoo Jang · Hongseok Yang · Kee-Eung Kim -
2020 Poster: Constrained Markov Decision Processes via Backward Value Functions »
Harsh Satija · Philip Amortila · Joelle Pineau -
2020 Poster: Interference and Generalization in Temporal Difference Learning »
Emmanuel Bengio · Joelle Pineau · Doina Precup -
2020 Poster: Invariant Causal Prediction for Block MDPs »
Amy Zhang · Clare Lyle · Shagun Sodhani · Angelos Filos · Marta Kwiatkowska · Joelle Pineau · Yarin Gal · Doina Precup -
2020 Poster: Batch Reinforcement Learning with Hyperparameter Gradients »
Byung-Jun Lee · Jongmin Lee · Peter Vrancx · Dongho Kim · Kee-Eung Kim -
2019 Workshop: Generative Modeling and Model-Based Reasoning for Robotics and AI »
Aravind Rajeswaran · Emanuel Todorov · Igor Mordatch · William Agnew · Amy Zhang · Joelle Pineau · Michael Chang · Dumitru Erhan · Sergey Levine · Kimberly Stachenfeld · Marvin Zhang -
2019 Poster: Separable value functions across time-scales »
Joshua Romoff · Peter Henderson · Ahmed Touati · Yann Ollivier · Joelle Pineau · Emma Brunskill -
2019 Oral: Separable value functions across time-scales »
Joshua Romoff · Peter Henderson · Ahmed Touati · Yann Ollivier · Joelle Pineau · Emma Brunskill -
2018 Poster: Focused Hierarchical RNNs for Conditional Sequence Processing »
Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal -
2018 Oral: Focused Hierarchical RNNs for Conditional Sequence Processing »
Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal -
2018 Poster: An Inference-Based Policy Gradient Method for Learning Options »
Matthew Smith · Herke van Hoof · Joelle Pineau -
2018 Oral: An Inference-Based Policy Gradient Method for Learning Options »
Matthew Smith · Herke van Hoof · Joelle Pineau -
2017 Workshop: Reproducibility in Machine Learning Research »
Rosemary Nan Ke · Anirudh Goyal · Alex Lamb · Joelle Pineau · Samy Bengio · Yoshua Bengio -
2017 : Lifelong Learning - Panel Discussion »
Sergey Levine · Joelle Pineau · Balaraman Ravindran · Andrei A Rusu -
2017 : Joelle Pineau: A few modest insights from my lifelong learning »
Joelle Pineau