Timezone: »

Model-based Offline Reinforcement Learning with Local Misspecification
Kefan Dong · Ramtin Keramati · Emma Brunskill

In this paper we propose a model-based offline reinforcement learning algorithm that explicitly handles model misspecification and distribution mismatch. Theoretically, we prove a safe policy improvement theorem by establishing pessimism approximations to the value function. Our algorithm can output the best policy in the given policy class with interpretable error terms measuring misspecification level, distribution mismatch, and statistical deviation. In addition, as long as the model family can approximate the transitions of state-action pairs visits by a policy, we can approximate the value of that particular policy. We visualize the effect of error terms in the LQR setting, and show that the experiment results match our theory.

Author Information

Kefan Dong (Tsinghua University)
Ramtin Keramati (Stanford University)
Emma Brunskill (Stanford University)

More from the Same Authors