Timezone: »

Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning
Yihao Sun · Jiaji Zhang · Chengxing Jia · Haoxin Lin · Junyin Ye · Yang Yu

Tue Jul 25 02:00 PM -- 04:30 PM (PDT) @ Exhibit Hall 1 #117

For offline reinforcement learning (RL), model-based methods are expected to be data-efficient as they incorporate dynamics models to generate more data. However, due to inevitable model errors, straightforwardly learning a policy in the model typically fails in the offline setting. Previous studies have incorporated conservatism to prevent out-of-distribution exploration. For example, MOPO penalizes rewards through uncertainty measures from predicting the next states, which we have discovered are loose bounds of the ideal uncertainty, i.e., the Bellman error. In this work, we propose MOdel-Bellman Inconsistency penalized offLinE Policy Optimization (MOBILE), a novel uncertainty-driven offline RL algorithm. MOBILE conducts uncertainty quantification through the inconsistency of Bellman estimations under an ensemble of learned dynamics models, which can be a better approximator to the true Bellman error, and penalizes the Bellman estimation based on this uncertainty. Empirically we have verified that our proposed uncertainty quantification can be significantly closer to the true Bellman error than the compared methods. Consequently, MOBILE outperforms prior offline RL approaches on most tasks of D4RL and NeoRL benchmarks.

Author Information

Yihao Sun (Nanjing University)
Jiaji Zhang (Nanjing University)
Chengxing Jia (Nanjing University)
Haoxin Lin (Nanjing University)
Junyin Ye (Nanjing University)
Yang Yu (Nanjing University)

More from the Same Authors