Timezone: »
The Power of Duality Principle in Offline Average-Reward Reinforcement Learning
Asuman Ozdaglar · Sarath Pattathil · Jiawei Zhang · Kaiqing Zhang
Offline reinforcement learning (RL) is widely used to find an optimal policy using a pre-collected dataset, without further interaction with the environment. Recent RL theory has made significant progress in developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage, with specific focuses on either infinite-horizon discounted or finite-horizon episodic Markov decision processes (MDPs). In this work, we revisit the LP framework and the induced duality principle for offline RL, specifically for *infinite-horizon average-reward* MDPs. By virtue of this LP formulation and the duality principle, our result achieves the $\tilde O(1/\sqrt{n})$ near-optimal rate under partial data coverage assumptions. Our key enabler is to *relax* the equality *constraint* and introduce proper new *inequality constraints* in the dual formulation of the LP. We hope our insights can shed new lights on the use of LP formulations and the induced duality principle, in offline RL.
Author Information
Asuman Ozdaglar (MIT)
Sarath Pattathil (Massachusetts Institute of Technology)
Jiawei Zhang (Massachusetts Institute of Technology)
Kaiqing Zhang (University of Maryland, College Park)
More from the Same Authors
-
2021 : Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity »
Kaiqing Zhang · Xiangyuan Zhang · Bin Hu · Tamer Basar -
2021 : Decentralized Q-Learning in Zero-sum Markov Games »
Kaiqing Zhang · David Leslie · Tamer Basar · Asuman Ozdaglar -
2023 : Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation »
Qiwen Cui · Kaiqing Zhang · Simon Du -
2023 : Toward Understanding Latent Model Learning in MuZero: A Case Study in Linear Quadratic Gaussian Control »
Yi Tian · Kaiqing Zhang · Russ Tedrake · Suvrit Sra -
2023 : Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation »
Qiwen Cui · Kaiqing Zhang · Simon Du -
2023 : Time-Reversed Dissipation Induces Duality Between Minimizing Gradient Norm and Function Value »
Jaeyeon Kim · Asuman Ozdaglar · Chanwoo Park · Ernest Ryu -
2023 Poster: Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing »
Xiangyu Liu · Kaiqing Zhang -
2023 Poster: Linearly Constrained Bilevel Optimization: A Smoothed Implicit Gradient Approach »
Prashant Khanduri · Ioannis Tsaknakis · Yihua Zhang · Jia Liu · Sijia Liu · Jiawei Zhang · Mingyi Hong -
2023 Poster: Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation »
Asuman Ozdaglar · Sarath Pattathil · Jiawei Zhang · Kaiqing Zhang -
2022 : What is a Good Metric to Study Generalization of Minimax Learners? »
Asuman Ozdaglar · Sarath Pattathil · Jiawei Zhang · Kaiqing Zhang -
2022 Poster: On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning »
Weichao Mao · Lin Yang · Kaiqing Zhang · Tamer Basar -
2022 Poster: Do Differentiable Simulators Give Better Policy Gradients? »
Hyung Ju Suh · Max Simchowitz · Kaiqing Zhang · Russ Tedrake -
2022 Spotlight: On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning »
Weichao Mao · Lin Yang · Kaiqing Zhang · Tamer Basar -
2022 Oral: Do Differentiable Simulators Give Better Policy Gradients? »
Hyung Ju Suh · Max Simchowitz · Kaiqing Zhang · Russ Tedrake -
2022 Poster: Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence »
Dongsheng Ding · Chen-Yu Wei · Kaiqing Zhang · Mihailo Jovanovic -
2022 Oral: Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence »
Dongsheng Ding · Chen-Yu Wei · Kaiqing Zhang · Mihailo Jovanovic -
2021 Poster: Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs »
Weichao Mao · Kaiqing Zhang · Ruihao Zhu · David Simchi-Levi · Tamer Basar -
2021 Poster: Train simultaneously, generalize better: Stability of gradient-based minimax learners »
Farzan Farnia · Asuman Ozdaglar -
2021 Spotlight: Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs »
Weichao Mao · Kaiqing Zhang · Ruihao Zhu · David Simchi-Levi · Tamer Basar -
2021 Spotlight: Train simultaneously, generalize better: Stability of gradient-based minimax learners »
Farzan Farnia · Asuman Ozdaglar -
2021 Poster: A Wasserstein Minimax Framework for Mixed Linear Regression »
Theo Diamandis · Yonina Eldar · Alireza Fallah · Farzan Farnia · Asuman Ozdaglar -
2021 Oral: A Wasserstein Minimax Framework for Mixed Linear Regression »
Theo Diamandis · Yonina Eldar · Alireza Fallah · Farzan Farnia · Asuman Ozdaglar -
2021 Poster: Reinforcement Learning for Cost-Aware Markov Decision Processes »
Wesley A Suttle · Kaiqing Zhang · Zhuoran Yang · Ji Liu · David N Kraemer -
2021 Spotlight: Reinforcement Learning for Cost-Aware Markov Decision Processes »
Wesley A Suttle · Kaiqing Zhang · Zhuoran Yang · Ji Liu · David N Kraemer -
2020 Poster: Do GANs always have Nash equilibria? »
Farzan Farnia · Asuman Ozdaglar -
2018 Poster: Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents »
Kaiqing Zhang · Zhuoran Yang · Han Liu · Tong Zhang · Tamer Basar -
2018 Oral: Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents »
Kaiqing Zhang · Zhuoran Yang · Han Liu · Tong Zhang · Tamer Basar