Bayesian Hypothesis Testing Policy Regularization
Sarah Rathnam · Susan Murphy · Finale Doshi-Velez
Abstract
In reinforcement learning (RL), sparse feedback makes it difficult to target long-term outcomes, often resulting in high-variance policies. Real-world interventions instead rely on prior study data, expert input, or short-term proxies to guide exploration. In this work, we propose Bayesian Hypothesis Testing Policy Regularization (BHTPR), a method that integrates a previously-learned policy with a policy learned online. BHTPR uses Bayesian hypothesis testing to determine, state by state, when to transfer the prior policy and when to rely on online learning.
Chat is not available.
Successful Page Load