Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Reinforcement Learning Theory

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Dingwen Kong · Ruslan Salakhutdinov · Ruosong Wang · Lin Yang


Abstract: Designing provably efficient algorithms with general function approximation is an important open problem in reinforcement learning. Recently, Wang et al.~[2020c] establish a value-based algorithm with general function approximation that enjoys O~(poly(dH)K)\footnote{Throughout the paper, we use O~() to suppress logarithm factors. } regret bound, where d depends on the complexity of the function class, H is the planning horizon, and K is the total number of episodes. However, their algorithm requires Ω(K) computation time per round, rendering the algorithm inefficient for practical use. In this paper, by applying online sub-sampling techniques, we develop an algorithm that takes O~(poly(dH)) computation time per round on average, and enjoys nearly the same regret bound. Furthermore, the algorithm achieves low switching cost, i.e., it changes the policy only O~(poly(dH)) times during its execution, making it appealing to be implemented in real-life scenarios. Moreover, by using an upper-confidence based exploration-driven reward function, the algorithm provably explores the environment in the reward-free setting. In particular, after O~(poly(dH))/ϵ2 rounds of exploration, the algorithm outputs an ϵ-optimal policy for any given reward function.

Chat is not available.