Timezone: »

Inverse Transition Learning for Characterizing Near-Optimal Dynamics in Offline Reinforcement Learning
Leo Benac · Sonali Parbhoo · Finale Doshi-Velez

Offline Reinforcement learning is commonly used for sequential decision-making in domains such as healthcare, where the rewards are known and the dynamics must be estimated on the basis of a single batch data. A key challenge for all tasks is how to learn a reliable estimate of the dynamics that produce near-optimal policies that are safe to deploy in high-stake settings. We propose a new constraint-based approach that captures our desiderata for reliably learning a set of dynamics that is free from gradients. Our results demonstrate that by using our constraints to learn an estimate of model dynamics, we learn near-optimal policies, while considerably reducing the policy's variance. We also show how combining uncertainty estimation with these constraints can help us infer a ranking of actions that produce higher returns, thereby enabling more interpretable performant policies for planning overall.

Author Information

Leo Benac (Harvard University)

PhD student at Harvard in Reinforcement Learning

Sonali Parbhoo (Imperial College London)
Finale Doshi-Velez (Harvard University)
Finale Doshi-Velez

Finale Doshi-Velez is a Gordon McKay Professor in Computer Science at the Harvard Paulson School of Engineering and Applied Sciences. She completed her MSc from the University of Cambridge as a Marshall Scholar, her PhD from MIT, and her postdoc at Harvard Medical School. Her interests lie at the intersection of machine learning, healthcare, and interpretability. Selected Additional Shinies: BECA recipient, AFOSR YIP and NSF CAREER recipient; Sloan Fellow; IEEE AI Top 10 to Watch

More from the Same Authors