Timezone: »
Poster
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Chen-Yu Wei · Mehdi Jafarnia · Haipeng Luo · Hiteshi Sharma · Rahul Jain
Wed Jul 15 02:00 PM -- 02:45 PM & Thu Jul 16 03:00 AM -- 03:45 AM (PDT) @
Model-free reinforcement learning is known to be memory and computation efficient and more amendable to large scale problems. In this paper, two model-free algorithms are introduced for learning infinite-horizon average-reward Markov Decision Processes (MDPs). The first algorithm reduces the problem to the discounted-reward version and achieves $\mathcal{O}(T^{2/3})$ regret after $T$ steps, under the minimal assumption of weakly communicating MDPs. To our knowledge, this is the first model-free algorithm for general MDPs in this setting. The second algorithm makes use of recent advances in adaptive algorithms for adversarial multi-armed bandits and improves the regret to $\mathcal{O}(\sqrt{T})$, albeit with a stronger ergodic assumption. This result significantly improves over the $\mathcal{O}(T^{3/4})$ regret achieved by the only existing model-free algorithm by Abbasi-Yadkori et al. (2019) for ergodic MDPs in the infinite-horizon average-reward setting.
Author Information
Chen-Yu Wei (University of Southern California)
Mehdi Jafarnia (University of Southern California)
Haipeng Luo (University of Southern California)
Hiteshi Sharma (University of Southern California)
Rahul Jain (USC)
More from the Same Authors
-
2021 : Online Learning for Stochastic Shortest Path Model via Posterior Sampling »
Mehdi Jafarnia · Liyu Chen · Rahul Jain · Haipeng Luo -
2021 : Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses »
Haipeng Luo · Chen-Yu Wei · Chung-Wei Lee -
2021 : Designing Interpretable Approximations to Deep Reinforcement Learning »
Nathan Dahlin · Rahul Jain · Pierluigi Nuzzo · Krishna Kalagarla · Nikhil Naik -
2021 : Implicit Finite-Horizon Approximation for Stochastic Shortest Path »
Liyu Chen · Mehdi Jafarnia · Rahul Jain · Haipeng Luo -
2023 Poster: Best of Both Worlds Policy Optimization »
Christoph Dann · Chen-Yu Wei · Julian Zimmert -
2023 Poster: Refined Regret for Adversarial MDPs with Linear Function Approximation »
Yan Dai · Haipeng Luo · Chen-Yu Wei · Julian Zimmert -
2023 Poster: Leveraging Demonstrations to Improve Online Learning: Quality Matters »
Botao Hao · Rahul Jain · Tor Lattimore · Benjamin Van Roy · Zheng Wen -
2023 Oral: Best of Both Worlds Policy Optimization »
Christoph Dann · Chen-Yu Wei · Julian Zimmert -
2022 Poster: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 Spotlight: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 Poster: Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP »
Liyu Chen · Rahul Jain · Haipeng Luo -
2022 Poster: Learning Infinite-horizon Average-reward Markov Decision Process with Constraints »
Liyu Chen · Rahul Jain · Haipeng Luo -
2022 Poster: Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence »
Dongsheng Ding · Chen-Yu Wei · Kaiqing Zhang · Mihailo Jovanovic -
2022 Oral: Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence »
Dongsheng Ding · Chen-Yu Wei · Kaiqing Zhang · Mihailo Jovanovic -
2022 Spotlight: Learning Infinite-horizon Average-reward Markov Decision Process with Constraints »
Liyu Chen · Rahul Jain · Haipeng Luo -
2022 Oral: Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP »
Liyu Chen · Rahul Jain · Haipeng Luo -
2021 : Implicit Finite-Horizon Approximation for Stochastic Shortest Path »
Liyu Chen · Mehdi Jafarnia · Rahul Jain · Haipeng Luo -
2021 Poster: Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang · Xiaojin Zhang -
2021 Spotlight: Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang · Xiaojin Zhang -
2019 Poster: Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case »
Alina Beygelzimer · David Pal · Balazs Szorenyi · Devanathan Thiruvenkatachari · Chen-Yu Wei · Chicheng Zhang -
2019 Oral: Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case »
Alina Beygelzimer · David Pal · Balazs Szorenyi · Devanathan Thiruvenkatachari · Chen-Yu Wei · Chicheng Zhang -
2019 Poster: Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously »
Julian Zimmert · Haipeng Luo · Chen-Yu Wei -
2019 Oral: Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously »
Julian Zimmert · Haipeng Luo · Chen-Yu Wei