Timezone: »
Poster
Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost
Sanae Amani · Tor Lattimore · Andras Gyorgy · Lin Yang
We study distributed contextual linear bandits with stochastic contexts, where $N$ agents/learners act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features over the course of $T$ rounds. For this problem, we derive the first ever information-theoretic lower bound $\Omega(dN)$ on the communication cost of any algorithm that performs optimally in a regret minimization setup. We then propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server. We prove that the communication cost of DisBE-LUCB, matches our lower bound up to logarithmic factors. In particular, for scenarios with known context distribution, the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is $\tilde{\mathcal{O}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds. We also provide similar bounds for practical settings where the context distribution can only be estimated. Therefore, our proposed algorithm is nearly minimax optimal in terms of *both regret and communication cost*. Finally, we propose DecBE-LUCB, a fully decentralized version of DisBE-LUCB, which operates without a central server, where agents share information with their *immediate neighbors* through a carefully designed consensus procedure.
Author Information
Sanae Amani (University of California, Los Angeles)
Tor Lattimore (DeepMind)
Andras Gyorgy (Google DeepMind)
Lin Yang (UCLA)
More from the Same Authors
-
2021 : Gap-Dependent Unsupervised Exploration for Reinforcement Learning »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2021 : Online Sub-Sampling for Reinforcement Learning with General Function Approximation »
Dingwen Kong · Ruslan Salakhutdinov · Ruosong Wang · Lin Yang -
2021 : Solving Multi-Arm Bandit Using a Few Bits of Communication »
Osama Hanna · Lin Yang · Christina Fragouli -
2022 : Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms »
MohammadJavad Azizi · Thang Duong · Yasin Abbasi-Yadkori · Claire Vernade · Andras Gyorgy · Mohammad Ghavamzadeh -
2022 : Provably Correct SGD-based Exploration for Linear Bandit »
Jialin Dong · Lin Yang -
2022 : Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning »
Dingwen Kong · Lin Yang -
2022 : Improved Generalization Bounds for Transfer Learning via Neural Collapse »
Tomer Galanti · Andras Gyorgy · Marcus Hutter -
2023 Poster: Understanding Self-Predictive Learning for Reinforcement Learning »
Yunhao Tang · Zhaohan Guo · Pierre Richemond · Bernardo Avila Pires · Yash Chandak · Remi Munos · Mark Rowland · Mohammad Gheshlaghi Azar · Charline Le Lan · Clare Lyle · Andras Gyorgy · Shantanu Thakoor · Will Dabney · Bilal Piot · Daniele Calandriello · Michal Valko -
2023 Poster: Does Sparsity Help in Learning Misspecified Linear Bandits? »
Jialin Dong · Lin Yang -
2023 Poster: Leveraging Demonstrations to Improve Online Learning: Quality Matters »
Botao Hao · Rahul Jain · Tor Lattimore · Benjamin Van Roy · Zheng Wen -
2023 Poster: Horizon-free Learning for Markov Decision Processes and Games: Stochastically Bounded Rewards and Improved Bounds »
Shengshi Li · Lin Yang -
2023 Poster: Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling »
Yunfan Li · Yiran Wang · Yu Cheng · Lin Yang -
2022 Poster: On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning »
Weichao Mao · Lin Yang · Kaiqing Zhang · Tamer Basar -
2022 Spotlight: On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning »
Weichao Mao · Lin Yang · Kaiqing Zhang · Tamer Basar -
2021 : Solving Multi-Arm Bandit Using a Few Bits of Communication »
Osama Hanna · Lin Yang · Christina Fragouli -
2021 Workshop: Workshop on Reinforcement Learning Theory »
Shipra Agrawal · Simon Du · Niao He · Csaba Szepesvari · Lin Yang -
2021 Poster: Adapting to Delays and Data in Adversarial Multi-Armed Bandits »
Andras Gyorgy · Pooria Joulani -
2021 Poster: Provably Correct Optimization and Exploration with Non-linear Policies »
Fei Feng · Wotao Yin · Alekh Agarwal · Lin Yang -
2021 Poster: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Spotlight: Provably Correct Optimization and Exploration with Non-linear Policies »
Fei Feng · Wotao Yin · Alekh Agarwal · Lin Yang -
2021 Spotlight: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Poster: Safe Reinforcement Learning with Linear Function Approximation »
Sanae Amani · Christos Thrampoulidis · Lin Yang -
2021 Spotlight: Adapting to Delays and Data in Adversarial Multi-Armed Bandits »
Andras Gyorgy · Pooria Joulani -
2021 Spotlight: Safe Reinforcement Learning with Linear Function Approximation »
Sanae Amani · Christos Thrampoulidis · Lin Yang -
2020 Poster: Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound »
Lin Yang · Mengdi Wang -
2020 Poster: Model-Based Reinforcement Learning with Value-Targeted Regression »
Alex Ayoub · Zeyu Jia · Csaba Szepesvari · Mengdi Wang · Lin Yang -
2020 Poster: Non-Stationary Delayed Bandits with Intermediate Observations »
Claire Vernade · Andras Gyorgy · Timothy Mann -
2020 Poster: A simpler approach to accelerated optimization: iterative averaging meets optimism »
Pooria Joulani · Anant Raj · Andras Gyorgy · Csaba Szepesvari -
2020 Poster: Nearly Linear Row Sampling Algorithm for Quantile Regression »
Yi Li · Ruosong Wang · Lin Yang · Hanrui Zhang -
2020 Poster: Obtaining Adjustable Regularization for Free via Iterate Averaging »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2019 Poster: Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems »
Timothy Mann · Sven Gowal · Andras Gyorgy · Huiyi Hu · Ray Jiang · Balaji Lakshminarayanan · Prav Srinivasan -
2019 Oral: Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems »
Timothy Mann · Sven Gowal · Andras Gyorgy · Huiyi Hu · Ray Jiang · Balaji Lakshminarayanan · Prav Srinivasan -
2019 Poster: CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · Andras Gyorgy · Csaba Szepesvari -
2019 Oral: CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · Andras Gyorgy · Csaba Szepesvari -
2018 Poster: LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · Andras Gyorgy · Csaba Szepesvari -
2018 Oral: LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · Andras Gyorgy · Csaba Szepesvari