ICML 2022 On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs Spotlight

Spotlight

On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs

Yuanzhou Chen · Jiafan He · Quanquan Gu

Room 307

[ Abstract ] [ Visit Reinforcement Learning ]

[ Slides] [ Paper PDF]

Abstract: We study reinforcement learning for infinite-horizon discounted linear kernel MDPs, where the transition probability function is linear in a predefined feature mapping. Existing UCLK \citep{zhou2020provably} algorithm for this setting only has a regret guarantee, which cannot lead to a tight sample complexity bound. In this paper, we extend the uniform-PAC sample complexity from episodic setting to the infinite-horizon discounted setting, and propose a novel algorithm dubbed UPAC-UCLK that achieves an

\TildeO(d2/((1−γ)4ϵ2)+1/((1−γ)6ϵ2))

$\Tilde{O}\big(d^2/((1-\gamma)^4\epsilon^2)+1/((1-\gamma)^6\epsilon^2)\big)$ uniform-PAC sample complexity, where

$d$ is the dimension of the feature mapping,

$\gamma \in(0,1)$ is the discount factor of the MDP and

$\epsilon$ is the accuracy parameter. To the best of our knowledge, this is the first

$\tilde{O}(1/\epsilon^2)$ sample complexity bound for learning infinite-horizon discounted MDPs with linear function approximation (without access to the generative model).

Chat is not available.