Timezone: »

Poster
Sample-Efficient Reinforcement Learning for POMDPs with Linear Function Approximations
Qi Cai · Zhuoran Yang · Zhaoran Wang

Wed Jul 20 03:30 PM -- 05:30 PM (PDT) @ Hall E #811
Despite the success of reinforcement learning (RL) for large-scale Markov decision processes (MDPs) with function approximation, most RL algorithms easily fail if the agent only has partial observations of the state. Such a setting is often modeled as a partially observable Markov decision process (POMDP). On the other hand, existing sample-efficient algorithms for POMDPs are restricted to the tabular settings where the state and observation spaces are finite. In this paper, we make the first attempt in tackling such a tension between function approximation and partial observability. In specific, we focus on a class of undercomplete POMDPs with linear function approximations, which allows the state and observation spaces to be infinite. For such a model, we show that the optimal policy and value function can be characterized by a sequence of finite-memory Bellman operators. We propose an RL algorithm that constructs optimistic estimators of these operators via RKHS embedding and uncertainty quantification. Moreover, we theoretically prove that the proposed algorithm finds an $\varepsilon$-optimal policy with $\tilde O (1/\varepsilon^2)$ episodes of exploration. Also, this sample complexity only depends on the intrinsic dimension of the POMDP model polynomially, and is independent of the size of the state and observations spaces. To our best knowledge, we develop the first provably sample-efficient algorithm for POMDPs with function approximation.