Timezone: »
For offline reinforcement learning (RL), model-based methods are expected to be data-efficient as they incorporate dynamics models to generate more data. However, due to inevitable model errors, straightforwardly learning a policy in the model typically fails in the offline setting. Previous studies have incorporated conservatism to prevent out-of-distribution exploration. For example, MOPO penalizes rewards through uncertainty measures from predicting the next states, which we have discovered are loose bounds of the ideal uncertainty, i.e., the Bellman error. In this work, we propose MOdel-Bellman Inconsistency penalized offLinE Policy Optimization (MOBILE), a novel uncertainty-driven offline RL algorithm. MOBILE conducts uncertainty quantification through the inconsistency of Bellman estimations under an ensemble of learned dynamics models, which can be a better approximator to the true Bellman error, and penalizes the Bellman estimation based on this uncertainty. Empirically we have verified that our proposed uncertainty quantification can be significantly closer to the true Bellman error than the compared methods. Consequently, MOBILE outperforms prior offline RL approaches on most tasks of D4RL and NeoRL benchmarks.
Author Information
Yihao Sun (Nanjing University)
Jiaji Zhang (Nanjing University)
Chengxing Jia (Nanjing University)
Haoxin Lin (Nanjing University)
Junyin Ye (Nanjing University)
Yang Yu (Nanjing University)
More from the Same Authors
-
2023 : How to Improve Imitation Learning Performance with Sub-optimal Supplementary Data? »
Ziniu Li · Tian Xu · Zeyu Qin · Yang Yu · Zhiquan Luo -
2023 Poster: Policy Regularization with Dataset Constraint for Offline Reinforcement Learning »
Yuhang Ran · Yi-Chen Li · Fuxiang Zhang · Zongzhang Zhang · Yang Yu -
2022 Poster: The Teaching Dimension of Regularized Kernel Learners »
Hong Qian · Xu-Hui Liu · Chen-Xi Su · Aimin Zhou · Yang Yu -
2022 Spotlight: The Teaching Dimension of Regularized Kernel Learners »
Hong Qian · Xu-Hui Liu · Chen-Xi Su · Aimin Zhou · Yang Yu -
2021 : RL Research-to-RealLife Gap Panel »
Craig Buhr · Jeff Mendenhall · Yang Yu · Matthew Taylor