Poster
in
Workshop: Workshop on Reinforcement Learning Theory
Model-based Offline Reinforcement Learning with Local Misspecification
Kefan Dong · Ramtin Keramati · Emma Brunskill
In this paper we propose a model-based offline reinforcement learning algorithm that explicitly handles model misspecification and distribution mismatch. Theoretically, we prove a safe policy improvement theorem by establishing pessimism approximations to the value function. Our algorithm can output the best policy in the given policy class with interpretable error terms measuring misspecification level, distribution mismatch, and statistical deviation. In addition, as long as the model family can approximate the transitions of state-action pairs visits by a policy, we can approximate the value of that particular policy. We visualize the effect of error terms in the LQR setting, and show that the experiment results match our theory.