Poster
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity
Zhang Zihan · Yuan Zhou · Xiangyang Ji
Virtual
Keywords: [ Reinforcement Learning and Planning ]
Abstract:
In this paper we consider the problem of learning an -optimal policy for a discounted Markov Decision Process (MDP). Given an MDP with states, actions, the discount factor , and an approximation threshold , we provide a model-free algorithm to learn an -optimal policy with sample complexity \footnote{In this work, the notation hides poly-logarithmic factors of , and .} and success probability . For small enough , we show an improved algorithm with sample complexity . While the first bound improves upon all known model-free algorithms and model-based ones with tight dependence on , our second algorithm beats all known sample complexity bounds and matches the information theoretic lower bound up to logarithmic factors.
Chat is not available.