Timezone: »

Under-exploring in Bandits with Confounded Data
Nihal Sharma · Soumya Basu · Karthikeyan Shanmugam · Sanjay Shakkottai

We study the problem of Multi-Armed Bandits with mean bounds where each arm is associated with an interval in which its mean reward lies. We develop the GLobal Under-Explore (GLUE) algorithm which, for each arm, uses these intervals to infer pseudo-variances'' that instruct the rate of exploration. We provide regret guarantees for GLUE and show that it is never worse than the standard Upper Confidence Bound Algorithm. Further, we show regimes in which GLUE improves upon existing regret guarantees for structured bandit problems. Finally, we present the practical setting of learning adaptive interventions using prior confounded data in which unrecorded variables affect rewards. We show that mean bounds for each intervention can be extracted from such logs and can thus be used to improve the learning process. We also provide semi-synthetic experiments on real-world data sets to validate our findings.