ICML Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Poster
in
Workshop: Workshop on Reinforcement Learning Theory

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Ming Yin · Yu Bai · Yu-Xiang Wang

[ Abstract ]

[ Visit Poster at Spot B5 in Virtual World ]

Abstract: We consider the problem of offline reinforcement learning (RL) --- a well-motivated setting of RL that aims at policy optimization using only historical data. In this paper, we propose \emph{Off-Policy Double Variance Reduction} (OPDVR), a new variance reduction based algorithm for offline RL. Our main result shows that OPDVR provably identifies an

ϵ

$\epsilon$ -optimal policy with

\tilde{O} (H^{2} / d_{m} ϵ^{2})

$\widetilde{O}(H^2/d_m\epsilon^2)$ episodes of offline data in the finite-horizon \emph{stationary transition} setting and this improves over the best known upper bound by a factor of

H

$H$ . Moreover, we establish an information-theoretic lower bound of

Ω (H^{2} / d_{m} ϵ^{2})

$\Omega(H^2/d_m\epsilon^2)$ which certifies that OPDVR is optimal up to logarithmic factors. Lastly, we show that OPDVR also achieves rate-optimal sample complexity under alternative settings such as the finite-horizon MDPs with non-stationary transitions and the infinite horizon MDPs with discounted rewards.

Chat is not available.

Poster in Workshop: Workshop on Reinforcement Learning Theory

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Ming Yin · Yu Bai · Yu-Xiang Wang

Poster
in
Workshop: Workshop on Reinforcement Learning Theory