Timezone: »

Bayesian Inverse Transition Learning for Offline Settings
Leo Benac · Sonali Parbhoo · Finale Doshi-Velez
Event URL: https://openreview.net/forum?id=GE1Wb4zApe »

Offline Reinforcement learning is commonly usedfor sequential decision-making in domains suchas healthcare and education, where the rewardsare known and the transition dynamics T mustbe estimated on the basis of batch data. A keychallenge for all tasks is how to learn a reliable estimateof the transition dynamics T that producenear-optimal policies that are safe enough so thatthey never take actions that are far away from thebest action with respect to their value functionsand informative enough so that they communicatethe uncertainties they have. Using an expert’sfeedback, we propose a new constraint-based approachthat captures our desiderata for reliablylearning a posterior distribution of the transitiondynamics T that is free from gradients. Our resultsdemonstrate that by using our constraints,we learn a high-performing policy, while considerablyreducing the policy’s variance over differentdatasets. We also explain how combining uncertaintyestimation with these constraints can helpus infer a partial ranking of actions that producehigher returns, and helps us infer safer and moreinformative policies for planning.

Author Information

Leo Benac (Harvard University)

PhD student at Harvard in Reinforcement Learning

Sonali Parbhoo (Imperial College London)
Finale Doshi-Velez (Harvard University)
Finale Doshi-Velez

Finale Doshi-Velez is a Gordon McKay Professor in Computer Science at the Harvard Paulson School of Engineering and Applied Sciences. She completed her MSc from the University of Cambridge as a Marshall Scholar, her PhD from MIT, and her postdoc at Harvard Medical School. Her interests lie at the intersection of machine learning, healthcare, and interpretability. Selected Additional Shinies: BECA recipient, AFOSR YIP and NSF CAREER recipient; Sloan Fellow; IEEE AI Top 10 to Watch

More from the Same Authors