Timezone: »

Thompson Sampling for Combinatorial Semi-Bandits
Siwei Wang · Wei Chen

Thu Jul 12 09:15 AM -- 12:00 PM (PDT) @ Hall B #57
We study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of $O(m\log T / \Delta_{\min}) $ for TS under general CMAB, where $m$ is the number of arms, $T$ is the time horizon, and $\Delta_{\min}$ is the minimum gap between the expected reward of the optimal solution and any non-optimal solution. We also show that one cannot use an approximate oracle in TS algorithm for even MAB problems. Then we expand the analysis to matroid bandit, a special case of CMAB and for which we could remove the independence assumption across arms and achieve a better regret bound. Finally, we use some experiments to show the comparison of regrets of CUCB and CTS algorithms.

Author Information

Siwei Wang (Tsinghua University)
Wei Chen (Microsoft)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors