Timezone: »
Budgeting Counterfactual for Offline RL
Yao Liu · Pratik Chaudhari · Rasool Fakoor
The main challenge of offline reinforcement learning, where data is limited, arises from a sequence of counterfactual reasoning dilemmas within the realm of potential actions: What if we were to choose a different course of action? These circumstances frequently give rise to extrapolation errors, which tend to accumulate exponentially with the problem horizon. Hence, it becomes crucial to acknowledge that not all decision steps are equally important to the final outcome, and to budget the number of counterfactual decisions a policy make in order to control the extrapolation. Contrary to existing approaches that use regularization on either the policy or value function, we propose an approach to explicitly bound the amount of out-of-distribution actions during training. Specifically, our method utilizes dynamic programming to decide where to extrapolate and where not to, with an upper bound on the decisions different from behavior policy. It balances between the potential for improvement from taking out-of-distribution actions and the risk of making errors due to extrapolation. Theoretically, we justify our method by the constrained optimality of the fixed point solution to our $Q$ updating rules. Empirically, we show that the overall performance of our method is better than the state-of-the-art offline RL methods on tasks in the widely-used D4RL benchmarks.
Author Information
Yao Liu (Amazon)
Pratik Chaudhari (UPenn, AWS)
Rasool Fakoor (AWS)
More from the Same Authors
-
2021 : Continuous Doubly Constrained Batch Reinforcement Learning »
Rasool Fakoor · Jonas Mueller · Kavosh Asadi · Pratik Chaudhari · Alex Smola -
2023 : The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold »
Jialin Mao · Han Kheng Teoh · Itay Griniasty · Rahul Ramesh · Rubing Yang · Mark Transtrum · James Sethna · Pratik Chaudhari -
2023 Workshop: New Frontiers in Learning, Control, and Dynamical Systems »
Valentin De Bortoli · Charlotte Bunne · Guan-Horng Liu · Tianrong Chen · Maxim Raginsky · Pratik Chaudhari · Melanie Zeilinger · Animashree Anandkumar -
2023 Poster: The Value of Out-of-Distribution Data »
Ashwin De Silva · Rahul Ramesh · Carey Priebe · Pratik Chaudhari · Joshua Vogelstein -
2023 Poster: A Picture of the Space of Typical Learnable Tasks »
Rahul Ramesh · Jialin Mao · Itay Griniasty · Rubing Yang · Han Kheng Teoh · Mark Transtrum · James Sethna · Pratik Chaudhari -
2023 Poster: Flexible Model Aggregation for Quantile Regression »
Rasool Fakoor · Taesup Kim · Jonas Mueller · Alexander Smola · Ryan Tibshirani -
2022 Poster: Does the Data Induce Capacity Control in Deep Learning? »
Rubing Yang · Jialin Mao · Pratik Chaudhari -
2022 Spotlight: Does the Data Induce Capacity Control in Deep Learning? »
Rubing Yang · Jialin Mao · Pratik Chaudhari -
2022 Poster: Deep Reference Priors: What is the best way to pretrain a model? »
Yansong Gao · Rahul Ramesh · Pratik Chaudhari -
2022 Spotlight: Deep Reference Priors: What is the best way to pretrain a model? »
Yansong Gao · Rahul Ramesh · Pratik Chaudhari -
2021 Poster: An Information-Geometric Distance on the Space of Tasks »
Yansong Gao · Pratik Chaudhari -
2021 Spotlight: An Information-Geometric Distance on the Space of Tasks »
Yansong Gao · Pratik Chaudhari -
2020 Poster: A Free-Energy Principle for Representation Learning »
Yansong Gao · Pratik Chaudhari