Timezone: »
In this paper, we study combinatorial pure exploration for dueling bandits (CPE-DB): we have multiple candidates for multiple positions as modeled by a bipartite graph, and in each round we sample a duel of two candidates on one position and observe who wins in the duel, with the goal of finding the best candidate-position matching with high probability after multiple rounds of samples. CPE-DB is an adaptation of the original combinatorial pure exploration for multi-armed bandit (CPE-MAB) problem to the dueling bandit setting. We consider both the Borda winner and the Condorcet winner cases. For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round. For Condorcet winner, we first design a fully polynomial time approximation scheme (FPTAS) for the offline problem of finding the Condorcet winner with known winning probabilities, and then use the FPTAS as an oracle to design a novel pure exploration algorithm CAR-Cond with sample complexity analysis. CAR-Cond is the first algorithm with polynomial running time per round for identifying the Condorcet winner in CPE-DB.
Author Information
Wei Chen (Microsoft)
Yihan Du (IIIS, Tsinghua University)
Longbo Huang (Tsinghua University)
Haoyu Zhao (Tsinghua University)
More from the Same Authors
-
2021 : The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition »
Tiancheng Jin · Longbo Huang · Haipeng Luo -
2023 Poster: Task-Specific Skill Localization in Fine-tuned Language Models »
Abhishek Panigrahi · Nikunj Saunshi · Haoyu Zhao · Sanjeev Arora -
2023 Poster: Multi-task Representation Learning for Pure Exploration in Linear Bandits »
Yihan Du · Longbo Huang · Wen Sun -
2023 Poster: Bandit Multi-linear DR-Submodular Maximization and Its Applications on Adversarial Submodular Bandits »
Zongqi Wan · Jialin Zhang · Wei Chen · Xiaoming Sun · Zhijie Zhang -
2023 Poster: Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning »
Jiatai Huang · Yan Dai · Longbo Huang -
2023 Poster: Contextual Combinatorial Bandits with Probabilistically Triggered Arms »
Xutong Liu · Jinhang Zuo · Siwei Wang · John C.S. Lui · Mohammad Hajiesmaili · Adam Wierman · Wei Chen -
2022 Poster: Branching Reinforcement Learning »
Yihan Du · Wei Chen -
2022 Poster: Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification »
Ling Pan · Longbo Huang · Tengyu Ma · Huazhe Xu -
2022 Spotlight: Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification »
Ling Pan · Longbo Huang · Tengyu Ma · Huazhe Xu -
2022 Spotlight: Branching Reinforcement Learning »
Yihan Du · Wei Chen -
2022 Poster: Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation »
Pihe Hu · Yu Chen · Longbo Huang -
2022 Poster: Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably) »
Yu Huang · Junyang Lin · Chang Zhou · Hongxia Yang · Longbo Huang -
2022 Spotlight: Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably) »
Yu Huang · Junyang Lin · Chang Zhou · Hongxia Yang · Longbo Huang -
2022 Spotlight: Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation »
Pihe Hu · Yu Chen · Longbo Huang -
2022 Poster: Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits »
Jiatai Huang · Yan Dai · Longbo Huang -
2022 Spotlight: Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits »
Jiatai Huang · Yan Dai · Longbo Huang -
2021 Poster: Multi-layered Network Exploration via Random Walks: From Offline Optimization to Online Learning »
Xutong Liu · Jinhang Zuo · Xiaowei Chen · Wei Chen · John C. S. Lui -
2021 Oral: Multi-layered Network Exploration via Random Walks: From Offline Optimization to Online Learning »
Xutong Liu · Jinhang Zuo · Xiaowei Chen · Wei Chen · John C. S. Lui -
2021 Poster: Network Inference and Influence Maximization from Samples »
Wei Chen · Xiaoming Sun · Jialin Zhang · Zhijie Zhang -
2021 Oral: Network Inference and Influence Maximization from Samples »
Wei Chen · Xiaoming Sun · Jialin Zhang · Zhijie Zhang -
2021 Tutorial: Privacy in learning: Basics and the interplay »
Huishuai Zhang · Wei Chen -
2020 Poster: Optimization from Structured Samples for Coverage Functions »
Wei Chen · Xiaoming Sun · Jialin Zhang · Zhijie Zhang -
2020 Poster: (Locally) Differentially Private Combinatorial Semi-Bandits »
Xiaoyu Chen · Kai Zheng · Zixin Zhou · Yunchang Yang · Wei Chen · Liwei Wang -
2018 Poster: Thompson Sampling for Combinatorial Semi-Bandits »
Siwei Wang · Wei Chen -
2018 Oral: Thompson Sampling for Combinatorial Semi-Bandits »
Siwei Wang · Wei Chen