ICML 2017 Schedule

( events) Timezone:

Poster

Wed Aug 09 01:30 AM -- 05:00 AM (PDT) @ Gallery #136

Counterfactual Data-Fusion for Online Reinforcement Learners

In Posters Wed

Andrew Forney · Judea Pearl · Elias Bareinboim

[ PDF] [

Summary/Notes]

The Multi-Armed Bandit problem with Unobserved Confounders (MABUC) considers decision-making settings where unmeasured variables can influence both the agent's decisions and received rewards (Bareinboim et al., 2015). Recent findings showed that unobserved confounders (UCs) pose a unique challenge to algorithms based on standard randomization (i.e., experimental data); if UCs are naively averaged out, these algorithms behave sub-optimally, possibly incurring infinite regret. In this paper, we show how counterfactual-based decision-making circumvents these problems and leads to a coherent fusion of observational and experimental data. We then demonstrate this new strategy in an enhanced Thompson Sampling bandit player, and support our findings' efficacy with extensive simulations.