Timezone: »

 
Avoiding Overfitting to the Importance Weights in Offline Policy Optimization
Yao Liu · Emma Brunskill

Offline policy optimization has a critical impact on many real-world decision-making problems, as online learning is costly and concerning in many applications. Importance sampling and its variants are a widely used type of estimator in offline policy evaluation, which can be helpful to remove dependence on the chosen function approximations used to represent value functions and process models. In this paper, we identify an important overfitting phenomenon in optimizing the importance weighted return, and propose an algorithm to avoid this overfitting. We provide a theoretical justification of the proposed algorithm through a better per-state-neighborhood normalization condition and show the limitation of previous attempts to this approach through an illustrative example. We further test our proposed method in a healthcare-inspired simulator and a logged dataset collected from real hospitals. These experiments show the proposed method with less overfitting and better test performance compared with state-of-the-art batch reinforcement learning algorithms.

Author Information

Yao Liu (Stanford University)
Emma Brunskill (Stanford University)
Emma Brunskill

Emma Brunskill is an associate tenured professor in the Computer Science Department at Stanford University. Brunskill’s lab aims to create AI systems that learn from few samples to robustly make good decisions and is part of the Stanford AI Lab, the Stanford Statistical ML group, and AI Safety @Stanford. Brunskill has received a NSF CAREER award, Office of Naval Research Young Investigator Award, a Microsoft Faculty Fellow award and an alumni impact award from the computer science and engineering department at the University of Washington. Brunskill and her lab have received multiple best paper nominations and awards both for their AI and machine learning work (UAI best paper, Reinforcement Learning and Decision Making Symposium best paper twice) and for their work in Ai of education (Intelligent Tutoring Systems Conference, Educational Data Mining conference x3, CHI).

More from the Same Authors