Timezone: »
Oral
Thompson Sampling for Combinatorial Semi-Bandits
Siwei Wang · Wei Chen
We study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of $O(m\log T / \Delta_{\min}) $ for TS under general CMAB, where $m$ is the number of arms, $T$ is the time horizon, and $\Delta_{\min}$ is the minimum gap between the expected reward of the optimal solution and any non-optimal solution. We also show that one cannot use an approximate oracle in TS algorithm for even MAB problems. Then we expand the analysis to matroid bandit, a special case of CMAB and for which we could remove the independence assumption across arms and achieve a better regret bound. Finally, we use some experiments to show the comparison of regrets of CUCB and CTS algorithms.
Author Information
Siwei Wang (Tsinghua University)
Wei Chen (Microsoft)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Thompson Sampling for Combinatorial Semi-Bandits »
Thu. Jul 12th 04:15 -- 07:00 PM Room Hall B #57
More from the Same Authors
-
2022 Poster: Branching Reinforcement Learning »
Yihan Du · Wei Chen -
2022 Spotlight: Branching Reinforcement Learning »
Yihan Du · Wei Chen -
2021 Poster: Multi-layered Network Exploration via Random Walks: From Offline Optimization to Online Learning »
Xutong Liu · Jinhang Zuo · Xiaowei Chen · Wei Chen · John C. S. Lui -
2021 Oral: Multi-layered Network Exploration via Random Walks: From Offline Optimization to Online Learning »
Xutong Liu · Jinhang Zuo · Xiaowei Chen · Wei Chen · John C. S. Lui -
2021 Poster: Network Inference and Influence Maximization from Samples »
Wei Chen · Xiaoming Sun · Jialin Zhang · Zhijie Zhang -
2021 Oral: Network Inference and Influence Maximization from Samples »
Wei Chen · Xiaoming Sun · Jialin Zhang · Zhijie Zhang -
2021 Tutorial: Privacy in learning: Basics and the interplay »
Huishuai Zhang · Wei Chen -
2020 Poster: Optimization from Structured Samples for Coverage Functions »
Wei Chen · Xiaoming Sun · Jialin Zhang · Zhijie Zhang -
2020 Poster: (Locally) Differentially Private Combinatorial Semi-Bandits »
Xiaoyu Chen · Kai Zheng · Zixin Zhou · Yunchang Yang · Wei Chen · Liwei Wang -
2020 Poster: Combinatorial Pure Exploration for Dueling Bandit »
Wei Chen · Yihan Du · Longbo Huang · Haoyu Zhao