Timezone: »
Minimax Regret for Stochastic Shortest Path
Alon Cohen · Yonathan Efroni · Yishay Mansour · Aviv Rosenberg
We study the Stochastic Shortest Path (SSP) problem in which an agent has to reach a goal state in minimum total expected cost. In the learning formulation of the problem, the agent has no prior knowledge about the costs and dynamics of the model. She repeatedly interacts with the model for $K$ episodes, and has to minimize her regret. In this work we show that the minimax regret for this setting is $\widetilde O(\sqrt{ (B_\star^2 + B_\star) |S| |A| K})$ where $B_\star$ is a bound on the expected cost of the optimal policy from any state, $S$ is the state space, and $A$ is the action space. This matches the $\Omega (\sqrt{ B_\star^2 |S| |A| K})$ lower bound of Rosenberg et al. (2020) for $B_\star \ge 1$, and improves their regret bound by a factor of $\sqrt{|S|}$. For $B_\star < 1$ we prove a matching lower bound of $\Omega (\sqrt{ B_\star |S| |A| K})$. Our algorithm is based on a novel reduction from SSP to finite-horizon MDPs. To that end, we provide an algorithm for the finite-horizon setting whose leading term in the regret depends polynomially on the expected cost of the optimal policy and only logarithmically on the horizon.
Author Information
Alon Cohen (Technion and Google)
Yonathan Efroni (Microsoft Research, New York)
Yishay Mansour (Google and Tel Aviv University)
Aviv Rosenberg (Tel Aviv University)
More from the Same Authors
-
2021 : Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure »
Aviv Rosenberg · Yishay Mansour -
2021 : Learning Adversarial Markov Decision Processes with Delayed Feedback »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2021 : Provable RL with Exogenous Distractors via Multistep Inverse Dynamics »
Yonathan Efroni · Dipendra Misra · Akshay Krishnamurthy · Alekh Agarwal · John Langford -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 : Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP »
Orin Levy · Yishay Mansour -
2022 : Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2023 Oral: Random Classification Noise does not defeat All Convex Potential Boosters Irrespective of Model Choice »
Yishay Mansour · Richard Nock · Robert C. Williamson -
2023 Poster: Reinforcement Learning Can Be More Efficient with Multiple Rewards »
Christoph Dann · Yishay Mansour · Mehryar Mohri -
2023 Poster: Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation »
Uri Sherman · Tomer Koren · Yishay Mansour -
2023 Poster: Regret Minimization and Convergence to Equilibria in General-sum Markov Games »
Liad Erez · Tal Lancewicki · Uri Sherman · Tomer Koren · Yishay Mansour -
2023 Poster: Concurrent Shuffle Differential Privacy Under Continual Observation »
Jay Tenenbaum · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2023 Poster: Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation »
Orin Levy · Alon Cohen · Asaf Cassel · Yishay Mansour -
2023 Poster: Random Classification Noise does not defeat All Convex Potential Boosters Irrespective of Model Choice »
Yishay Mansour · Richard Nock · Robert C. Williamson -
2023 Poster: Principled Offline RL in the Presence of Rich Exogenous Information »
Riashat Islam · Manan Tomar · Alex Lamb · Yonathan Efroni · Hongyu Zang · Aniket Didolkar · Dipendra Misra · Xin Li · Harm Seijen · Remi Tachet des Combes · John Langford -
2023 Poster: Reward-Mixing MDPs with Few Latent Contexts are Learnable »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2023 Poster: Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback »
Tal Lancewicki · Aviv Rosenberg · Dmitry Sotnikov -
2022 : Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2022 Poster: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Spotlight: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Poster: Cooperative Online Learning in Stochastic and Adversarial MDPs »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2022 Poster: FriendlyCore: Practical Differentially Private Aggregation »
Eliad Tsfadia · Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2022 Poster: Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2022 Poster: Provable Reinforcement Learning with a Short-Term Memory »
Yonathan Efroni · Chi Jin · Akshay Krishnamurthy · Sobhan Miryoosefi -
2022 Oral: Cooperative Online Learning in Stochastic and Adversarial MDPs »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2022 Spotlight: Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2022 Spotlight: FriendlyCore: Practical Differentially Private Aggregation »
Eliad Tsfadia · Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2022 Spotlight: Provable Reinforcement Learning with a Short-Term Memory »
Yonathan Efroni · Chi Jin · Akshay Krishnamurthy · Sobhan Miryoosefi -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2021 Poster: Differentially-Private Clustering of Easy Instances »
Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer · Eliad Tsfadia -
2021 Spotlight: Differentially-Private Clustering of Easy Instances »
Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer · Eliad Tsfadia -
2021 Poster: Confidence-Budget Matching for Sequential Budgeted Learning »
Yonathan Efroni · Nadav Merlis · Aadirupa Saha · Shie Mannor -
2021 Poster: Adversarial Dueling Bandits »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Spotlight: Confidence-Budget Matching for Sequential Budgeted Learning »
Yonathan Efroni · Nadav Merlis · Aadirupa Saha · Shie Mannor -
2021 Spotlight: Adversarial Dueling Bandits »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Poster: Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions »
Tal Lancewicki · Shahar Segal · Tomer Koren · Yishay Mansour -
2021 Spotlight: Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions »
Tal Lancewicki · Shahar Segal · Tomer Koren · Yishay Mansour -
2021 Poster: Dueling Convex Optimization »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Spotlight: Dueling Convex Optimization »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2020 Poster: Optimistic Policy Optimization with Bandit Feedback »
Lior Shani · Yonathan Efroni · Aviv Rosenberg · Shie Mannor -
2020 Poster: Near-optimal Regret Bounds for Stochastic Shortest Path »
Aviv Rosenberg · Alon Cohen · Yishay Mansour · Haim Kaplan -
2020 Poster: Multi-step Greedy Reinforcement Learning Algorithms »
Manan Tomar · Yonathan Efroni · Mohammad Ghavamzadeh -
2019 Poster: Adversarial Online Learning with noise »
Alon Resler · Yishay Mansour -
2019 Poster: Online Convex Optimization in Adversarial Markov Decision Processes »
Aviv Rosenberg · Yishay Mansour -
2019 Poster: Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret »
Alon Cohen · Tomer Koren · Yishay Mansour -
2019 Poster: Differentially Private Learning of Geometric Concepts »
Haim Kaplan · Yishay Mansour · Yossi Matias · Uri Stemmer -
2019 Oral: Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret »
Alon Cohen · Tomer Koren · Yishay Mansour -
2019 Oral: Adversarial Online Learning with noise »
Alon Resler · Yishay Mansour -
2019 Oral: Differentially Private Learning of Geometric Concepts »
Haim Kaplan · Yishay Mansour · Yossi Matias · Uri Stemmer -
2019 Oral: Online Convex Optimization in Adversarial Markov Decision Processes »
Aviv Rosenberg · Yishay Mansour -
2019 Poster: Exploration Conscious Reinforcement Learning Revisited »
Lior Shani · Yonathan Efroni · Shie Mannor -
2019 Poster: Action Robust Reinforcement Learning and Applications in Continuous Control »
Chen Tessler · Chen Tessler · Yonathan Efroni · Shie Mannor -
2019 Oral: Exploration Conscious Reinforcement Learning Revisited »
Lior Shani · Yonathan Efroni · Shie Mannor -
2019 Oral: Action Robust Reinforcement Learning and Applications in Continuous Control »
Chen Tessler · Chen Tessler · Yonathan Efroni · Yonathan Efroni · Shie Mannor · Shie Mannor -
2018 Poster: Online Linear Quadratic Control »
Alon Cohen · Avinatan Hasidim · Tomer Koren · Nevena Lazic · Yishay Mansour · Kunal Talwar -
2018 Oral: Online Linear Quadratic Control »
Alon Cohen · Avinatan Hasidim · Tomer Koren · Nevena Lazic · Yishay Mansour · Kunal Talwar -
2018 Poster: Beyond the One-Step Greedy Approach in Reinforcement Learning »
Yonathan Efroni · Gal Dalal · Bruno Scherrer · Shie Mannor -
2018 Oral: Beyond the One-Step Greedy Approach in Reinforcement Learning »
Yonathan Efroni · Gal Dalal · Bruno Scherrer · Shie Mannor