Timezone: »

On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs
Yuanzhou Chen · Jiafan He · Quanquan Gu

Wed Jul 20 11:25 AM -- 11:30 AM (PDT) @ Room 307
We study reinforcement learning for infinite-horizon discounted linear kernel MDPs, where the transition probability function is linear in a predefined feature mapping. Existing UCLK \citep{zhou2020provably} algorithm for this setting only has a regret guarantee, which cannot lead to a tight sample complexity bound. In this paper, we extend the uniform-PAC sample complexity from episodic setting to the infinite-horizon discounted setting, and propose a novel algorithm dubbed UPAC-UCLK that achieves an $\Tilde{O}\big(d^2/((1-\gamma)^4\epsilon^2)+1/((1-\gamma)^6\epsilon^2)\big)$ uniform-PAC sample complexity, where $d$ is the dimension of the feature mapping, $\gamma \in(0,1)$ is the discount factor of the MDP and $\epsilon$ is the accuracy parameter. To the best of our knowledge, this is the first $\tilde{O}(1/\epsilon^2)$ sample complexity bound for learning infinite-horizon discounted MDPs with linear function approximation (without access to the generative model).

Author Information

Yuanzhou Chen (Peking University)
Jiafan He (University of California, Los Angeles)
Quanquan Gu (University of California, Los Angeles)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors