Timezone: »

Causal Bandits with Propagating Inference
Akihiro Yabe · Daisuke Hatano · Hanna Sumita · Shinji Ito · Naonori Kakimura · Takuro Fukunaga · Ken-ichi Kawarabayashi

Fri Jul 13 07:00 AM -- 07:20 AM (PDT) @ A5
Bandit is a framework for designing sequential experiments, where a learner selects an arm $A \in \mathcal{A}$ and obtains an observation corresponding to $A$ in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms $|\mathcal{A}|$, and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the $\Omega(\sqrt{|\mathcal{A}|/T})$ simple-regret lower-bound; however, their algorithms work only when the interventions $\mathcal{A}$ are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves $O(\sqrt{ \gamma^*\log(|\mathcal{A}|T) / T})$ regret bound, where $\gamma^*$ is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then $\gamma^* = O(N^2)$, where $N$ is the number of nodes.

Author Information

Akihiro Yabe (NEC Corporation)
Daisuke Hatano (RIKEN AIP)
Hanna Sumita (National Institute of Informatics)
Shinji Ito (NEC Corporation)
Naonori Kakimura (Keio University)
Takuro Fukunaga (RIKEN AIP)
Ken-ichi Kawarabayashi (National Institute of Informatics)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors