ICML Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Poster
in
Workshop: Workshop on Reinforcement Learning Theory

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Dingwen Kong · Ruslan Salakhutdinov · Ruosong Wang · Lin Yang

[ Abstract ]

[ Visit Poster at Spot C2 in Virtual World ]

Abstract: Designing provably efficient algorithms with general function approximation is an important open problem in reinforcement learning. Recently, Wang et al.~[2020c] establish a value-based algorithm with general function approximation that enjoys

\tilde{O} (p o l y (d H) \sqrt{K})

$\widetilde{O}(\mathrm{poly}(dH)\sqrt{K})$ \footnote{Throughout the paper, we use

\tilde{O} (\cdot)

$\widetilde{O}(\cdot)$ to suppress logarithm factors. } regret bound, where

d

$d$ depends on the complexity of the function class,

H

$H$ is the planning horizon, and

K

$K$ is the total number of episodes. However, their algorithm requires

Ω (K)

$\Omega(K)$ computation time per round, rendering the algorithm inefficient for practical use. In this paper, by applying online sub-sampling techniques, we develop an algorithm that takes

\tilde{O} (p o l y (d H))

$\widetilde{O}(\mathrm{poly}(dH))$ computation time per round on average, and enjoys nearly the same regret bound. Furthermore, the algorithm achieves low switching cost, i.e., it changes the policy only

\tilde{O} (p o l y (d H))

$\widetilde{O}(\mathrm{poly}(dH))$ times during its execution, making it appealing to be implemented in real-life scenarios. Moreover, by using an upper-confidence based exploration-driven reward function, the algorithm provably explores the environment in the reward-free setting. In particular, after

\tilde{O} (p o l y (d H)) / ϵ^{2}

$\widetilde{O}(\mathrm{poly}(dH))/\epsilon^2$ rounds of exploration, the algorithm outputs an

ϵ

$\epsilon$ -optimal policy for any given reward function.

Chat is not available.

Poster in Workshop: Workshop on Reinforcement Learning Theory

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Dingwen Kong · Ruslan Salakhutdinov · Ruosong Wang · Lin Yang

Poster
in
Workshop: Workshop on Reinforcement Learning Theory