Timezone: »
Oral
Cooperative Online Learning in Stochastic and Adversarial MDPs
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour
We study cooperative online learning in stochastic and adversarial Markov decision process (MDP). That is, in each episode, $m$ agents interact with an MDP simultaneously and share information in order to minimize their individual regret. We consider environments with two types of randomness: \emph{fresh} -- where each agent's trajectory is sampled i.i.d, and \emph{non-fresh} -- where the realization is shared by all agents (but each agent's trajectory is also affected by its own actions). More precisely, with non-fresh randomness the realization of every cost and transition is fixed at the start of each episode, and agents that take the same action in the same state at the same time observe the same cost and next state. We thoroughly analyze all relevant settings, highlight the challenges and differences between the models, and prove nearly-matching regret lower and upper bounds. To our knowledge, we are the first to consider cooperative reinforcement learning (RL) with either non-fresh randomness or in adversarial MDPs.
Author Information
Tal Lancewicki (Tel-Aviv University)
Aviv Rosenberg (Tel Aviv University)
Yishay Mansour (Google and Tel Aviv University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Cooperative Online Learning in Stochastic and Adversarial MDPs »
Tue. Jul 19th through Wed the 20th Room Hall E #1217
More from the Same Authors
-
2021 : Minimax Regret for Stochastic Shortest Path »
Alon Cohen · Yonathan Efroni · Yishay Mansour · Aviv Rosenberg -
2021 : Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure »
Aviv Rosenberg · Yishay Mansour -
2021 : Learning Adversarial Markov Decision Processes with Delayed Feedback »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2022 : Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP »
Orin Levy · Yishay Mansour -
2022 : Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2023 Oral: Random Classification Noise does not defeat All Convex Potential Boosters Irrespective of Model Choice »
Yishay Mansour · Richard Nock · Robert C. Williamson -
2023 Poster: Reinforcement Learning Can Be More Efficient with Multiple Rewards »
Christoph Dann · Yishay Mansour · Mehryar Mohri -
2023 Poster: Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation »
Uri Sherman · Tomer Koren · Yishay Mansour -
2023 Poster: Regret Minimization and Convergence to Equilibria in General-sum Markov Games »
Liad Erez · Tal Lancewicki · Uri Sherman · Tomer Koren · Yishay Mansour -
2023 Poster: Concurrent Shuffle Differential Privacy Under Continual Observation »
Jay Tenenbaum · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2023 Poster: Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation »
Orin Levy · Alon Cohen · Asaf Cassel · Yishay Mansour -
2023 Poster: Random Classification Noise does not defeat All Convex Potential Boosters Irrespective of Model Choice »
Yishay Mansour · Richard Nock · Robert C. Williamson -
2023 Poster: Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback »
Tal Lancewicki · Aviv Rosenberg · Dmitry Sotnikov -
2022 : Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2022 Poster: FriendlyCore: Practical Differentially Private Aggregation »
Eliad Tsfadia · Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2022 Spotlight: FriendlyCore: Practical Differentially Private Aggregation »
Eliad Tsfadia · Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2021 Poster: Differentially-Private Clustering of Easy Instances »
Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer · Eliad Tsfadia -
2021 Spotlight: Differentially-Private Clustering of Easy Instances »
Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer · Eliad Tsfadia -
2021 Poster: Adversarial Dueling Bandits »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Spotlight: Adversarial Dueling Bandits »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Poster: Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions »
Tal Lancewicki · Shahar Segal · Tomer Koren · Yishay Mansour -
2021 Spotlight: Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions »
Tal Lancewicki · Shahar Segal · Tomer Koren · Yishay Mansour -
2021 Poster: Dueling Convex Optimization »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Spotlight: Dueling Convex Optimization »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2020 Poster: Optimistic Policy Optimization with Bandit Feedback »
Lior Shani · Yonathan Efroni · Aviv Rosenberg · Shie Mannor -
2020 Poster: Near-optimal Regret Bounds for Stochastic Shortest Path »
Aviv Rosenberg · Alon Cohen · Yishay Mansour · Haim Kaplan -
2019 Poster: Adversarial Online Learning with noise »
Alon Resler · Yishay Mansour -
2019 Poster: Online Convex Optimization in Adversarial Markov Decision Processes »
Aviv Rosenberg · Yishay Mansour -
2019 Poster: Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret »
Alon Cohen · Tomer Koren · Yishay Mansour -
2019 Poster: Differentially Private Learning of Geometric Concepts »
Haim Kaplan · Yishay Mansour · Yossi Matias · Uri Stemmer -
2019 Oral: Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret »
Alon Cohen · Tomer Koren · Yishay Mansour -
2019 Oral: Adversarial Online Learning with noise »
Alon Resler · Yishay Mansour -
2019 Oral: Differentially Private Learning of Geometric Concepts »
Haim Kaplan · Yishay Mansour · Yossi Matias · Uri Stemmer -
2019 Oral: Online Convex Optimization in Adversarial Markov Decision Processes »
Aviv Rosenberg · Yishay Mansour