Timezone: »
Poster
Branching Reinforcement Learning
Yihan Du · Wei Chen
In this paper, we propose a novel Branching Reinforcement Learning (Branching RL) model, and investigate both Regret Minimization (RM) and Reward-Free Exploration (RFE) metrics for this model. Unlike standard RL where the trajectory of each episode is a single $H$-step path, branching RL allows an agent to take multiple base actions in a state such that transitions branch out to multiple successor states correspondingly, and thus it generates a tree-structured trajectory. This model finds important applications in hierarchical recommendation systems and online advertising. For branching RL, we establish new Bellman equations and key lemmas, i.e., branching value difference lemma and branching law of total variance, and also bound the total variance by only $O(H^2)$ under an exponentially-large trajectory. For RM and RFE metrics, we propose computationally efficient algorithms BranchVI and BranchRFE, respectively, and derive nearly matching upper and lower bounds. Our regret and sample complexity results are polynomial in all problem parameters despite exponentially-large trajectories.
Author Information
Yihan Du (IIIS, Tsinghua University)
Wei Chen (Microsoft)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Branching Reinforcement Learning »
Thu. Jul 21st 03:45 -- 03:50 PM Room Ballroom 3 & 4
More from the Same Authors
-
2021 Poster: Multi-layered Network Exploration via Random Walks: From Offline Optimization to Online Learning »
Xutong Liu · Jinhang Zuo · Xiaowei Chen · Wei Chen · John C. S. Lui -
2021 Oral: Multi-layered Network Exploration via Random Walks: From Offline Optimization to Online Learning »
Xutong Liu · Jinhang Zuo · Xiaowei Chen · Wei Chen · John C. S. Lui -
2021 Poster: Network Inference and Influence Maximization from Samples »
Wei Chen · Xiaoming Sun · Jialin Zhang · Zhijie Zhang -
2021 Oral: Network Inference and Influence Maximization from Samples »
Wei Chen · Xiaoming Sun · Jialin Zhang · Zhijie Zhang -
2021 Tutorial: Privacy in learning: Basics and the interplay »
Huishuai Zhang · Wei Chen -
2020 Poster: Optimization from Structured Samples for Coverage Functions »
Wei Chen · Xiaoming Sun · Jialin Zhang · Zhijie Zhang -
2020 Poster: (Locally) Differentially Private Combinatorial Semi-Bandits »
Xiaoyu Chen · Kai Zheng · Zixin Zhou · Yunchang Yang · Wei Chen · Liwei Wang -
2020 Poster: Combinatorial Pure Exploration for Dueling Bandit »
Wei Chen · Yihan Du · Longbo Huang · Haoyu Zhao -
2018 Poster: Thompson Sampling for Combinatorial Semi-Bandits »
Siwei Wang · Wei Chen -
2018 Oral: Thompson Sampling for Combinatorial Semi-Bandits »
Siwei Wang · Wei Chen