Timezone: »
Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg
The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observed in delay. This paper studies online learning in episodic Markov decision process (MDP) with unknown transitions, adversarially changing costs, and unrestricted delayed bandit feedback. More precisely, the feedback for the agent in episode $k$ is revealed only in the end of episode $k + d^k$, where the delay $d^k$ can be changing over episodes and chosen by an oblivious adversary. We present the first algorithms that achieve near-optimal $\sqrt{K + D}$ regret, where $K$ is the number of episodes and $D = \sum_{k=1}^K d^k$ is the total delay, significantly improving upon the best known regret bound of $(K + D)^{2/3}$.
Author Information
Tiancheng Jin (University of Southern California)
Tal Lancewicki (Tel-Aviv University)
Haipeng Luo (University of Southern California)
Yishay Mansour (Google and Tel Aviv University)
Aviv Rosenberg (Tel Aviv University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 : Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Sat. Jul 23rd 03:30 -- 03:50 PM Room
More from the Same Authors
-
2021 : Minimax Regret for Stochastic Shortest Path »
Alon Cohen · Yonathan Efroni · Yishay Mansour · Aviv Rosenberg -
2021 : Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure »
Aviv Rosenberg · Yishay Mansour -
2021 : Learning Adversarial Markov Decision Processes with Delayed Feedback »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2021 : Online Learning for Stochastic Shortest Path Model via Posterior Sampling »
Mehdi Jafarnia · Liyu Chen · Rahul Jain · Haipeng Luo -
2021 : The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition »
Tiancheng Jin · Longbo Huang · Haipeng Luo -
2021 : Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses »
Haipeng Luo · Chen-Yu Wei · Chung-Wei Lee -
2021 : Implicit Finite-Horizon Approximation for Stochastic Shortest Path »
Liyu Chen · Mehdi Jafarnia · Rahul Jain · Haipeng Luo -
2022 : Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP »
Orin Levy · Yishay Mansour -
2023 : Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games »
Yang Cai · Haipeng Luo · Chen-Yu Wei · Weiqiang Zheng -
2023 Oral: Random Classification Noise does not defeat All Convex Potential Boosters Irrespective of Model Choice »
Yishay Mansour · Richard Nock · Robert C. Williamson -
2023 Poster: Reinforcement Learning Can Be More Efficient with Multiple Rewards »
Christoph Dann · Yishay Mansour · Mehryar Mohri -
2023 Poster: Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation »
Uri Sherman · Tomer Koren · Yishay Mansour -
2023 Poster: Regret Minimization and Convergence to Equilibria in General-sum Markov Games »
Liad Erez · Tal Lancewicki · Uri Sherman · Tomer Koren · Yishay Mansour -
2023 Poster: Concurrent Shuffle Differential Privacy Under Continual Observation »
Jay Tenenbaum · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2023 Poster: Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation »
Orin Levy · Alon Cohen · Asaf Cassel · Yishay Mansour -
2023 Poster: Random Classification Noise does not defeat All Convex Potential Boosters Irrespective of Model Choice »
Yishay Mansour · Richard Nock · Robert C. Williamson -
2023 Poster: Refined Regret for Adversarial MDPs with Linear Function Approximation »
Yan Dai · Haipeng Luo · Chen-Yu Wei · Julian Zimmert -
2023 Poster: Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback »
Tal Lancewicki · Aviv Rosenberg · Dmitry Sotnikov -
2022 Poster: Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games »
Gabriele Farina · Chung-Wei Lee · Haipeng Luo · Christian Kroer -
2022 Spotlight: Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games »
Gabriele Farina · Chung-Wei Lee · Haipeng Luo · Christian Kroer -
2022 Poster: No-Regret Learning in Time-Varying Zero-Sum Games »
Mengxiao Zhang · Peng Zhao · Haipeng Luo · Zhi-Hua Zhou -
2022 Spotlight: No-Regret Learning in Time-Varying Zero-Sum Games »
Mengxiao Zhang · Peng Zhao · Haipeng Luo · Zhi-Hua Zhou -
2022 Poster: Cooperative Online Learning in Stochastic and Adversarial MDPs »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2022 Poster: FriendlyCore: Practical Differentially Private Aggregation »
Eliad Tsfadia · Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2022 Poster: Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP »
Liyu Chen · Rahul Jain · Haipeng Luo -
2022 Poster: Learning Infinite-horizon Average-reward Markov Decision Process with Constraints »
Liyu Chen · Rahul Jain · Haipeng Luo -
2022 Oral: Cooperative Online Learning in Stochastic and Adversarial MDPs »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2022 Spotlight: FriendlyCore: Practical Differentially Private Aggregation »
Eliad Tsfadia · Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2022 Spotlight: Learning Infinite-horizon Average-reward Markov Decision Process with Constraints »
Liyu Chen · Rahul Jain · Haipeng Luo -
2022 Oral: Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP »
Liyu Chen · Rahul Jain · Haipeng Luo -
2021 : Implicit Finite-Horizon Approximation for Stochastic Shortest Path »
Liyu Chen · Mehdi Jafarnia · Rahul Jain · Haipeng Luo -
2021 Poster: Differentially-Private Clustering of Easy Instances »
Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer · Eliad Tsfadia -
2021 Spotlight: Differentially-Private Clustering of Easy Instances »
Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer · Eliad Tsfadia -
2021 Poster: Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang · Xiaojin Zhang -
2021 Poster: Adversarial Dueling Bandits »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Poster: Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case »
Liyu Chen · Haipeng Luo -
2021 Spotlight: Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case »
Liyu Chen · Haipeng Luo -
2021 Spotlight: Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang · Xiaojin Zhang -
2021 Spotlight: Adversarial Dueling Bandits »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Poster: Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions »
Tal Lancewicki · Shahar Segal · Tomer Koren · Yishay Mansour -
2021 Spotlight: Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions »
Tal Lancewicki · Shahar Segal · Tomer Koren · Yishay Mansour -
2021 Poster: Dueling Convex Optimization »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Spotlight: Dueling Convex Optimization »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2020 Poster: Optimistic Policy Optimization with Bandit Feedback »
Lior Shani · Yonathan Efroni · Aviv Rosenberg · Shie Mannor -
2020 Poster: Near-optimal Regret Bounds for Stochastic Shortest Path »
Aviv Rosenberg · Alon Cohen · Yishay Mansour · Haim Kaplan -
2020 Poster: Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition »
Chi Jin · Tiancheng Jin · Haipeng Luo · Suvrit Sra · Tiancheng Yu -
2019 Poster: Adversarial Online Learning with noise »
Alon Resler · Yishay Mansour -
2019 Poster: Online Convex Optimization in Adversarial Markov Decision Processes »
Aviv Rosenberg · Yishay Mansour -
2019 Poster: Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret »
Alon Cohen · Tomer Koren · Yishay Mansour -
2019 Poster: Differentially Private Learning of Geometric Concepts »
Haim Kaplan · Yishay Mansour · Yossi Matias · Uri Stemmer -
2019 Oral: Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret »
Alon Cohen · Tomer Koren · Yishay Mansour -
2019 Oral: Adversarial Online Learning with noise »
Alon Resler · Yishay Mansour -
2019 Oral: Differentially Private Learning of Geometric Concepts »
Haim Kaplan · Yishay Mansour · Yossi Matias · Uri Stemmer -
2019 Oral: Online Convex Optimization in Adversarial Markov Decision Processes »
Aviv Rosenberg · Yishay Mansour -
2018 Poster: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire