Poster
in
Workshop: Workshop on Reinforcement Learning Theory
Online Sub-Sampling for Reinforcement Learning with General Function Approximation
Dingwen Kong · Ruslan Salakhutdinov · Ruosong Wang · Lin Yang
Abstract:
Designing provably efficient algorithms with general function approximation is an important open problem in reinforcement learning. Recently, Wang et al.~[2020c] establish a value-based algorithm with general function approximation that enjoys \footnote{Throughout the paper, we use to suppress logarithm factors. } regret bound, where depends on the complexity of the function class, is the planning horizon, and is the total number of episodes. However, their algorithm requires computation time per round, rendering the algorithm inefficient for practical use. In this paper, by applying online sub-sampling techniques, we develop an algorithm that takes computation time per round on average, and enjoys nearly the same regret bound. Furthermore, the algorithm achieves low switching cost, i.e., it changes the policy only times during its execution, making it appealing to be implemented in real-life scenarios. Moreover, by using an upper-confidence based exploration-driven reward function, the algorithm provably explores the environment in the reward-free setting. In particular, after rounds of exploration, the algorithm outputs an -optimal policy for any given reward function.
Chat is not available.