Timezone: »

Talk
Counterfactual Data-Fusion for Online Reinforcement Learners
Andrew Forney · Judea Pearl · Elias Bareinboim

Tue Aug 08 11:24 PM -- 11:42 PM (PDT) @ C4.1

The Multi-Armed Bandit problem with Unobserved Confounders (MABUC) considers decision-making settings where unmeasured variables can influence both the agent's decisions and received rewards (Bareinboim et al., 2015). Recent findings showed that unobserved confounders (UCs) pose a unique challenge to algorithms based on standard randomization (i.e., experimental data); if UCs are naively averaged out, these algorithms behave sub-optimally, possibly incurring infinite regret. In this paper, we show how counterfactual-based decision-making circumvents these problems and leads to a coherent fusion of observational and experimental data. We then demonstrate this new strategy in an enhanced Thompson Sampling bandit player, and support our findings' efficacy with extensive simulations.

#### Author Information

##### Elias Bareinboim (Purdue)

Elias Bareinboim is an associate professor in the Department of Computer Science and the director of the Causal Artificial Intelligence (CausalAI) Laboratory at Columbia University. His research focuses on causal and counterfactual inference and their applications to artificial intelligence and machine learning as well as data-driven fields in the health and social sciences. His work was the first to propose a general solution to the problem of causal data-fusion,'' providing practical methods for combining datasets generated under different experimental conditions and plagued with various biases. In the last years, Bareinboim has been exploring the intersection of causal inference with decision-making (including reinforcement learning) and explainability (including fairness analysis). Before joining Columbia, he was an assistant professor at Purdue University and received his Ph.D. in Computer Science from the University of California, Los Angeles. Bareinboim was named one of AI's 10 to Watch'' by IEEE, and is a recipient of an NSF CAREER Award, the Dan David Prize Scholarship, the 2014 AAAI Outstanding Paper Award, and the 2019 UAI Best Paper Award.