Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Reinforcement Learning Theory

Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings

Ming Yin · Yu-Xiang Wang


Abstract: This work studies the statistical limits of uniform convergence for offline policy evaluation (OPE) problems with model-based methods (for episodic MDP) and provides a unified framework towards optimal learning for several well-motivated offline tasks. We establish an Ω(H2S/dmϵ2) lower bound (over model-based family) for the global uniform OPE and our main result establishes an upper bound of O~(H2/dmϵ2) for the \emph{local} uniform convergence. The highlight in achieving the optimal rate O~(H2/dmϵ2) is our design of \emph{singleton absorbing MDP}, which is a new sharp analysis tool that works with the model-based approach. We generalize such a model-based framework to the new settings: offline task-agnostic and the offline reward-free with optimal complexity O~(H2log(K)/dmϵ2) (K is the number of tasks) and O~(H2S/dmϵ2) respectively. These results provide a unified solution for simultaneously solving different offline RL problems.

Chat is not available.