Skip to yearly menu bar Skip to main content


Poster

On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs

Yuanzhou Chen · Jiafan He · Quanquan Gu

Hall E #807

Keywords: [ RL: Function Approximation ] [ RL: Online ] [ RL: Discounted Cost/Reward ]


Abstract: We study reinforcement learning for infinite-horizon discounted linear kernel MDPs, where the transition probability function is linear in a predefined feature mapping. Existing UCLK \citep{zhou2020provably} algorithm for this setting only has a regret guarantee, which cannot lead to a tight sample complexity bound. In this paper, we extend the uniform-PAC sample complexity from episodic setting to the infinite-horizon discounted setting, and propose a novel algorithm dubbed UPAC-UCLK that achieves an \TildeO(d2/((1γ)4ϵ2)+1/((1γ)6ϵ2))\TildeO(d2/((1γ)4ϵ2)+1/((1γ)6ϵ2)) uniform-PAC sample complexity, where dd is the dimension of the feature mapping, γ(0,1)γ(0,1) is the discount factor of the MDP and ϵϵ is the accuracy parameter. To the best of our knowledge, this is the first ˜O(1/ϵ2)~O(1/ϵ2) sample complexity bound for learning infinite-horizon discounted MDPs with linear function approximation (without access to the generative model).

Chat is not available.