Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Duality Principles for Modern Machine Learning

The Power of Duality Principle in Offline Average-Reward Reinforcement Learning

Asuman Ozdaglar · Sarath Pattathil · Jiawei Zhang · Kaiqing Zhang

Keywords: [ Average Reward MDP ] [ Linear Programming ]


Abstract: Offline reinforcement learning (RL) is widely used to find an optimal policy using a pre-collected dataset, without further interaction with the environment. Recent RL theory has made significant progress in developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage, with specific focuses on either infinite-horizon discounted or finite-horizon episodic Markov decision processes (MDPs). In this work, we revisit the LP framework and the induced duality principle for offline RL, specifically for *infinite-horizon average-reward* MDPs. By virtue of this LP formulation and the duality principle, our result achieves the $\tilde O(1/\sqrt{n})$ near-optimal rate under partial data coverage assumptions. Our key enabler is to *relax* the equality *constraint* and introduce proper new *inequality constraints* in the dual formulation of the LP. We hope our insights can shed new lights on the use of LP formulations and the induced duality principle, in offline RL.

Chat is not available.