Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Interactive Learning with Implicit Human Feedback

Bayesian Inverse Transition Learning for Offline Settings

Leo Benac · Sonali Parbhoo · Finale Doshi-Velez


Abstract:

Offline Reinforcement learning is commonly usedfor sequential decision-making in domains suchas healthcare and education, where the rewardsare known and the transition dynamics T mustbe estimated on the basis of batch data. A keychallenge for all tasks is how to learn a reliable estimateof the transition dynamics T that producenear-optimal policies that are safe enough so thatthey never take actions that are far away from thebest action with respect to their value functionsand informative enough so that they communicatethe uncertainties they have. Using an expert’sfeedback, we propose a new constraint-based approachthat captures our desiderata for reliablylearning a posterior distribution of the transitiondynamics T that is free from gradients. Our resultsdemonstrate that by using our constraints,we learn a high-performing policy, while considerablyreducing the policy’s variance over differentdatasets. We also explain how combining uncertaintyestimation with these constraints can helpus infer a partial ranking of actions that producehigher returns, and helps us infer safer and moreinformative policies for planning.

Chat is not available.