Timezone: »

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Chen-Yu Wei · Mehdi Jafarnia · Haipeng Luo · Hiteshi Sharma · Rahul Jain

Wed Jul 15 02:00 PM -- 02:45 PM & Thu Jul 16 03:00 AM -- 03:45 AM (PDT) @
Model-free reinforcement learning is known to be memory and computation efficient and more amendable to large scale problems. In this paper, two model-free algorithms are introduced for learning infinite-horizon average-reward Markov Decision Processes (MDPs). The first algorithm reduces the problem to the discounted-reward version and achieves $\mathcal{O}(T^{2/3})$ regret after $T$ steps, under the minimal assumption of weakly communicating MDPs. To our knowledge, this is the first model-free algorithm for general MDPs in this setting. The second algorithm makes use of recent advances in adaptive algorithms for adversarial multi-armed bandits and improves the regret to $\mathcal{O}(\sqrt{T})$, albeit with a stronger ergodic assumption. This result significantly improves over the $\mathcal{O}(T^{3/4})$ regret achieved by the only existing model-free algorithm by Abbasi-Yadkori et al. (2019) for ergodic MDPs in the infinite-horizon average-reward setting.

Author Information

Chen-Yu Wei (University of Southern California)
Mehdi Jafarnia (University of Southern California)
Haipeng Luo (University of Southern California)
Hiteshi Sharma (University of Southern California)
Rahul Jain (USC)

More from the Same Authors