Skip to yearly menu bar Skip to main content


Poster

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Low-Rank MDPs

Kihyuk Hong · Ambuj Tewari


Abstract: Offline reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward using a pre-collected dataset.Offline RL with low-rank MDPs or general function approximation has been widely studied recently, but existing algorithms with sample complexity $\mathcal{O}(\epsilon^{-2})$ for finding an $\epsilon$-optimal policy either require a uniform data coverage assumptions or are computationally inefficient. In this paper, we propose a primal dual algorithm for offline RL with low-rank MDPs in the discounted infinite-horizon setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $\mathcal{O}(\epsilon^{-2})$ with partial data coverage assumption. This improves upon a recent work that requires $\mathcal{O}(\epsilon^{-4})$ samples.Moreover, our algorithm extends the previous work to the offline constrained RL setting by supporting constraints on additional reward signals.

Live content is unavailable. Log in and register to view live content