Timezone: »
We introduce a generic template for developing regret minimization algorithms in the Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as certain properties are ensured. The key of our analysis is a new technique called implicit finite-horizon approximation, which approximates the SSP model by a finite-horizon counterpart only in the analysis without explicit implementation. Using this template, we develop two new algorithms: the first one is model-free (the first in the literature to our knowledge) and minimax optimal under strictly positive costs; the second one is model-based and minimax optimal even with zero-cost state-action pairs, matching the best existing result from (Tarbouriech et al., 2021b). Importantly, both algorithms admit highly sparse updates, making them computationally more efficient than all existing algorithms. Moreover, both can be made completely parameter-free.
Author Information
Liyu Chen (USC)
Mehdi Jafarnia (University of Southern California)
Rahul Jain (USC)
Haipeng Luo (University of Southern California)
More from the Same Authors
-
2021 : Online Learning for Stochastic Shortest Path Model via Posterior Sampling »
Mehdi Jafarnia · Liyu Chen · Rahul Jain · Haipeng Luo -
2021 : The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition »
Tiancheng Jin · Longbo Huang · Haipeng Luo -
2021 : Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses »
Haipeng Luo · Chen-Yu Wei · Chung-Wei Lee -
2021 : Designing Interpretable Approximations to Deep Reinforcement Learning »
Nathan Dahlin · Rahul Jain · Pierluigi Nuzzo · Krishna Kalagarla · Nikhil Naik -
2022 : Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2023 : Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games »
Yang Cai · Haipeng Luo · Chen-Yu Wei · Weiqiang Zheng -
2023 Poster: Layered State Discovery for Incremental Autonomous Exploration »
Liyu Chen · Andrea Tirinzoni · Alessandro Lazaric · Matteo Pirotta -
2023 Poster: Leveraging Demonstrations to Improve Online Learning: Quality Matters »
Botao Hao · Rahul Jain · Tor Lattimore · Benjamin Van Roy · Zheng Wen -
2023 Poster: Refined Regret for Adversarial MDPs with Linear Function Approximation »
Yan Dai · Haipeng Luo · Chen-Yu Wei · Julian Zimmert -
2022 : Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2022 Poster: Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games »
Gabriele Farina · Chung-Wei Lee · Haipeng Luo · Christian Kroer -
2022 Spotlight: Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games »
Gabriele Farina · Chung-Wei Lee · Haipeng Luo · Christian Kroer -
2022 Poster: No-Regret Learning in Time-Varying Zero-Sum Games »
Mengxiao Zhang · Peng Zhao · Haipeng Luo · Zhi-Hua Zhou -
2022 Spotlight: No-Regret Learning in Time-Varying Zero-Sum Games »
Mengxiao Zhang · Peng Zhao · Haipeng Luo · Zhi-Hua Zhou -
2022 Poster: Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP »
Liyu Chen · Rahul Jain · Haipeng Luo -
2022 Poster: Learning Infinite-horizon Average-reward Markov Decision Process with Constraints »
Liyu Chen · Rahul Jain · Haipeng Luo -
2022 Spotlight: Learning Infinite-horizon Average-reward Markov Decision Process with Constraints »
Liyu Chen · Rahul Jain · Haipeng Luo -
2022 Oral: Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP »
Liyu Chen · Rahul Jain · Haipeng Luo -
2021 : Implicit Finite-Horizon Approximation for Stochastic Shortest Path »
Liyu Chen · Mehdi Jafarnia · Rahul Jain · Haipeng Luo -
2021 Poster: Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang · Xiaojin Zhang -
2021 Poster: Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case »
Liyu Chen · Haipeng Luo -
2021 Spotlight: Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case »
Liyu Chen · Haipeng Luo -
2021 Spotlight: Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang · Xiaojin Zhang -
2020 Poster: Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes »
Chen-Yu Wei · Mehdi Jafarnia · Haipeng Luo · Hiteshi Sharma · Rahul Jain -
2020 Poster: Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition »
Chi Jin · Tiancheng Jin · Haipeng Luo · Suvrit Sra · Tiancheng Yu -
2018 Poster: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire