Skip to yearly menu bar Skip to main content


Poster
in
Workshop: “Could it have been different?” Counterfactuals in Minds and Machines

Inverse Transition Learning for Characterizing Near-Optimal Dynamics in Offline Reinforcement Learning

Leo Benac · Sonali Parbhoo · Finale Doshi-Velez


Abstract:

Offline Reinforcement learning is commonly used for sequential decision-making in domains such as healthcare, where the rewards are known and the dynamics must be estimated on the basis of a single batch data. A key challenge for all tasks is how to learn a reliable estimate of the dynamics that produce near-optimal policies that are safe to deploy in high-stake settings. We propose a new constraint-based approach that captures our desiderata for reliably learning a set of dynamics that is free from gradients. Our results demonstrate that by using our constraints to learn an estimate of model dynamics, we learn near-optimal policies, while considerably reducing the policy's variance. We also show how combining uncertainty estimation with these constraints can help us infer a ranking of actions that produce higher returns, thereby enabling more interpretable performant policies for planning overall.

Chat is not available.