Processing math: 100%
Skip to yearly menu bar Skip to main content


( events)   Timezone:  
Oral
Fri Jul 13 07:00 AM -- 07:20 AM (PDT) @ A5
Causal Bandits with Propagating Inference
Akihiro Yabe · Daisuke Hatano · Hanna Sumita · Shinji Ito · Naonori Kakimura · Takuro Fukunaga · Ken-ichi Kawarabayashi
[ PDF
Bandit is a framework for designing sequential experiments, where a learner selects an arm AA and obtains an observation corresponding to A in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms |A|, and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the Ω(|A|/T) simple-regret lower-bound; however, their algorithms work only when the interventions A are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves O(γlog(|A|T)/T) regret bound, where γ is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then γ=O(N2), where N is the number of nodes.