ICML 2018 Schedule

( events) Timezone:

Oral

Fri Jul 13 07:00 AM -- 07:20 AM (PDT) @ A5

Causal Bandits with Propagating Inference

In Causal Inference 1

Akihiro Yabe · Daisuke Hatano · Hanna Sumita · Shinji Ito · Naonori Kakimura · Takuro Fukunaga · Ken-ichi Kawarabayashi

[ PDF]

Bandit is a framework for designing sequential experiments, where a learner selects an arm

$A \in \mathcal{A}$ and obtains an observation corresponding to

$A$ in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms

$|\mathcal{A}|$ , and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the

$\Omega(\sqrt{|\mathcal{A}|/T})$ simple-regret lower-bound; however, their algorithms work only when the interventions

$\mathcal{A}$ are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves

$O(\sqrt{ \gamma^*\log(|\mathcal{A}|T) / T})$ regret bound, where

$\gamma^*$ is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then

$\gamma^* = O(N^2)$ , where

$N$ is the number of nodes.