Timezone: »
Despite the recent success of representation learning in sequential decision making, the study of the pure exploration scenario (i.e., identify the best option and minimize the sample complexity) is still limited. In this paper, we study multi-task representation learning for best arm identification in linear bandit (RepBAI-LB) and best policy identification in contextual linear bandit (RepBPI-CLB), two popular pure exploration settings with wide applications, e.g., clinical trials and web content optimization. In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks. For these problems, we design computationally and sample efficient algorithms DouExpDes and C-DouExpDes, which perform double experimental designs to plan optimal sample allocations for learning the global representation. We show that by learning the common representation among tasks, our sample complexity is significantly better than that of the native approach which solves tasks independently. To the best of our knowledge, this is the first work to demonstrate the benefits of representation learning for multi-task pure exploration.
Author Information
Yihan Du (IIIS, Tsinghua University)
Longbo Huang (Tsinghua University)
Wen Sun (Cornell University)
More from the Same Authors
-
2021 : The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition »
Tiancheng Jin · Longbo Huang · Haipeng Luo -
2021 : Corruption Robust Offline Reinforcement Learning »
Xuezhou Zhang · Yiding Chen · Jerry Zhu · Wen Sun -
2021 : Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage »
Jonathan Chang · Masatoshi Uehara · Dhruv Sreenivas · Rahul Kidambi · Wen Sun -
2021 : MobILE: Model-Based Imitation Learning From Observation Alone »
Rahul Kidambi · Jonathan Chang · Wen Sun -
2023 : Representation Learning in Low-rank Slate-based Recommender Systems »
Yijia Dai · Wen Sun -
2023 : Provable Offline Reinforcement Learning with Human Feedback »
Wenhao Zhan · Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 : Contextual Bandits and Imitation Learning with Preference-Based Active Queries »
Ayush Sekhari · Karthik Sridharan · Wen Sun · Runzhe Wu -
2023 : Selective Sampling and Imitation Learning via Online Regression »
Ayush Sekhari · Karthik Sridharan · Wen Sun · Runzhe Wu -
2023 : Provable Offline Reinforcement Learning with Human Feedback »
Wenhao Zhan · Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 : How to Query Human Feedback Efficiently in RL? »
Wenhao Zhan · Masatoshi Uehara · Wen Sun · Jason Lee -
2023 : Contextual Bandits and Imitation Learning with Preference-Based Active Queries »
Ayush Sekhari · Karthik Sridharan · Wen Sun · Runzhe Wu -
2023 : How to Query Human Feedback Efficiently in RL? »
Wenhao Zhan · Masatoshi Uehara · Wen Sun · Jason Lee -
2023 Poster: Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR »
Kaiwen Wang · Nathan Kallus · Wen Sun -
2023 Poster: Distributional Offline Policy Evaluation with Predictive Error Guarantees »
Runzhe Wu · Masatoshi Uehara · Wen Sun -
2023 Poster: Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings »
Masatoshi Uehara · Ayush Sekhari · Jason Lee · Nathan Kallus · Wen Sun -
2023 Poster: Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning »
Jiatai Huang · Yan Dai · Longbo Huang -
2022 Poster: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Poster: Learning Bellman Complete Representations for Offline Policy Evaluation »
Jonathan Chang · Kaiwen Wang · Nathan Kallus · Wen Sun -
2022 Poster: Branching Reinforcement Learning »
Yihan Du · Wei Chen -
2022 Poster: Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification »
Ling Pan · Longbo Huang · Tengyu Ma · Huazhe Xu -
2022 Spotlight: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Oral: Learning Bellman Complete Representations for Offline Policy Evaluation »
Jonathan Chang · Kaiwen Wang · Nathan Kallus · Wen Sun -
2022 Spotlight: Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification »
Ling Pan · Longbo Huang · Tengyu Ma · Huazhe Xu -
2022 Spotlight: Branching Reinforcement Learning »
Yihan Du · Wei Chen -
2022 Poster: Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation »
Pihe Hu · Yu Chen · Longbo Huang -
2022 Poster: Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably) »
Yu Huang · Junyang Lin · Chang Zhou · Hongxia Yang · Longbo Huang -
2022 Spotlight: Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably) »
Yu Huang · Junyang Lin · Chang Zhou · Hongxia Yang · Longbo Huang -
2022 Spotlight: Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation »
Pihe Hu · Yu Chen · Longbo Huang -
2022 Poster: Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits »
Jiatai Huang · Yan Dai · Longbo Huang -
2022 Spotlight: Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits »
Jiatai Huang · Yan Dai · Longbo Huang -
2021 Poster: Fairness of Exposure in Stochastic Bandits »
Luke Lequn Wang · Yiwei Bai · Wen Sun · Thorsten Joachims -
2021 Spotlight: Fairness of Exposure in Stochastic Bandits »
Luke Lequn Wang · Yiwei Bai · Wen Sun · Thorsten Joachims -
2021 Poster: Robust Policy Gradient against Strong Data Corruption »
Xuezhou Zhang · Yiding Chen · Jerry Zhu · Wen Sun -
2021 Spotlight: Robust Policy Gradient against Strong Data Corruption »
Xuezhou Zhang · Yiding Chen · Jerry Zhu · Wen Sun -
2021 Poster: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Oral: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Poster: PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration »
Yuda Song · Wen Sun -
2021 Spotlight: PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration »
Yuda Song · Wen Sun -
2020 Poster: Combinatorial Pure Exploration for Dueling Bandit »
Wei Chen · Yihan Du · Longbo Huang · Haoyu Zhao