Timezone: »

Model-based Offline Reinforcement Learning with Local Misspecification
Kefan Dong · Ramtin Keramati · Emma Brunskill

In this paper we propose a model-based offline reinforcement learning algorithm that explicitly handles model misspecification and distribution mismatch. Theoretically, we prove a safe policy improvement theorem by establishing pessimism approximations to the value function. Our algorithm can output the best policy in the given policy class with interpretable error terms measuring misspecification level, distribution mismatch, and statistical deviation. In addition, as long as the model family can approximate the transitions of state-action pairs visits by a policy, we can approximate the value of that particular policy. We visualize the effect of error terms in the LQR setting, and show that the experiment results match our theory.

Author Information

Kefan Dong (Tsinghua University)
Ramtin Keramati (Stanford University)
Emma Brunskill (Stanford University)
Emma Brunskill

Emma Brunskill is an associate tenured professor in the Computer Science Department at Stanford University. Brunskill’s lab aims to create AI systems that learn from few samples to robustly make good decisions and is part of the Stanford AI Lab, the Stanford Statistical ML group, and AI Safety @Stanford. Brunskill has received a NSF CAREER award, Office of Naval Research Young Investigator Award, a Microsoft Faculty Fellow award and an alumni impact award from the computer science and engineering department at the University of Washington. Brunskill and her lab have received multiple best paper nominations and awards both for their AI and machine learning work (UAI best paper, Reinforcement Learning and Decision Making Symposium best paper twice) and for their work in Ai of education (Intelligent Tutoring Systems Conference, Educational Data Mining conference x3, CHI).

More from the Same Authors