ICML Poster On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs

Poster

On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs

Yuanzhou Chen · Jiafan He · Quanquan Gu

Hall E #807

Keywords: [ RL: Function Approximation ] [ RL: Online ] [ RL: Discounted Cost/Reward ]

[ Abstract ]

[ Paper PDF]

Abstract: We study reinforcement learning for infinite-horizon discounted linear kernel MDPs, where the transition probability function is linear in a predefined feature mapping. Existing UCLK \citep{zhou2020provably} algorithm for this setting only has a regret guarantee, which cannot lead to a tight sample complexity bound. In this paper, we extend the uniform-PAC sample complexity from episodic setting to the infinite-horizon discounted setting, and propose a novel algorithm dubbed UPAC-UCLK that achieves an

\TildeO(d2/((1−γ)4ϵ2)+1/((1−γ)6ϵ2))

$\Tilde{O}\big(d^2/((1-\gamma)^4\epsilon^2)+1/((1-\gamma)^6\epsilon^2)\big)$ uniform-PAC sample complexity, where

d

$d$ is the dimension of the feature mapping,

γ \in (0, 1)

$\gamma \in(0,1)$ is the discount factor of the MDP and

ϵ

$\epsilon$ is the accuracy parameter. To the best of our knowledge, this is the first

~ O (1 / ϵ^{2})

$\tilde{O}(1/\epsilon^2)$ sample complexity bound for learning infinite-horizon discounted MDPs with linear function approximation (without access to the generative model).

Chat is not available.