Skip to yearly menu bar Skip to main content

Workshop: Workshop on Reinforcement Learning Theory

Model-based Offline Reinforcement Learning with Local Misspecification

Kefan Dong · Ramtin Keramati · Emma Brunskill


In this paper we propose a model-based offline reinforcement learning algorithm that explicitly handles model misspecification and distribution mismatch. Theoretically, we prove a safe policy improvement theorem by establishing pessimism approximations to the value function. Our algorithm can output the best policy in the given policy class with interpretable error terms measuring misspecification level, distribution mismatch, and statistical deviation. In addition, as long as the model family can approximate the transitions of state-action pairs visits by a policy, we can approximate the value of that particular policy. We visualize the effect of error terms in the LQR setting, and show that the experiment results match our theory.

Chat is not available.