Timezone: »

The Power of Duality Principle in Offline Average-Reward Reinforcement Learning
Asuman Ozdaglar · Sarath Pattathil · Jiawei Zhang · Kaiqing Zhang
Offline reinforcement learning (RL) is widely used to find an optimal policy using a pre-collected dataset, without further interaction with the environment. Recent RL theory has made significant progress in developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage, with specific focuses on either infinite-horizon discounted or finite-horizon episodic Markov decision processes (MDPs). In this work, we revisit the LP framework and the induced duality principle for offline RL, specifically for *infinite-horizon average-reward* MDPs. By virtue of this LP formulation and the duality principle, our result achieves the $\tilde O(1/\sqrt{n})$ near-optimal rate under partial data coverage assumptions. Our key enabler is to *relax* the equality *constraint* and introduce proper new *inequality constraints* in the dual formulation of the LP. We hope our insights can shed new lights on the use of LP formulations and the induced duality principle, in offline RL.

Author Information

Asuman Ozdaglar (MIT)
Sarath Pattathil (Massachusetts Institute of Technology)
Jiawei Zhang (Massachusetts Institute of Technology)
Kaiqing Zhang (University of Maryland, College Park)

More from the Same Authors