Timezone: »
Offline reinforcement learning (RL) aims to find an optimal policy for sequential decision-making using a pre-collected dataset, without further interaction with the environment. Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators, especially to handle the case with excessively large state-action spaces. Among them, the framework based on the linear-programming (LP) reformulation of Markov decision processes has shown promise: it enables sample-efficient offline RL with function approximation, under only partial data coverage and realizability assumptions on the function classes, with favorable computational tractability. In this work, we revisit the LP framework for offline RL, and provide a new reformulation that advances the existing results in several aspects, relaxing certain assumptions and achieving optimal statistical rates in terms of sample size. Our key enabler is to introduce proper constraints in the reformulation, instead of using any regularization as in the literature, also with careful choices of the function classes and initial state distributions. We hope our insights bring into light the use of LP formulations and the induced primal-dual minimax optimization, in offline RL.
Author Information
Asuman Ozdaglar (MIT)
Sarath Pattathil (Massachusetts Institute of Technology)
Jiawei Zhang (Massachusetts Institute of Technology)
Kaiqing Zhang (University of Maryland, College Park)
More from the Same Authors
-
2021 : Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity »
Kaiqing Zhang · Xiangyuan Zhang · Bin Hu · Tamer Basar -
2021 : Decentralized Q-Learning in Zero-sum Markov Games »
Kaiqing Zhang · David Leslie · Tamer Basar · Asuman Ozdaglar -
2023 : Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation »
Qiwen Cui · Kaiqing Zhang · Simon Du -
2023 : Toward Understanding Latent Model Learning in MuZero: A Case Study in Linear Quadratic Gaussian Control »
Yi Tian · Kaiqing Zhang · Russ Tedrake · Suvrit Sra -
2023 : Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation »
Qiwen Cui · Kaiqing Zhang · Simon Du -
2023 : The Power of Duality Principle in Offline Average-Reward Reinforcement Learning »
Asuman Ozdaglar · Sarath Pattathil · Jiawei Zhang · Kaiqing Zhang -
2023 : Time-Reversed Dissipation Induces Duality Between Minimizing Gradient Norm and Function Value »
Jaeyeon Kim · Asuman Ozdaglar · Chanwoo Park · Ernest Ryu -
2023 Poster: Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing »
Xiangyu Liu · Kaiqing Zhang -
2023 Poster: Linearly Constrained Bilevel Optimization: A Smoothed Implicit Gradient Approach »
Prashant Khanduri · Ioannis Tsaknakis · Yihua Zhang · Jia Liu · Sijia Liu · Jiawei Zhang · Mingyi Hong -
2022 : What is a Good Metric to Study Generalization of Minimax Learners? »
Asuman Ozdaglar · Sarath Pattathil · Jiawei Zhang · Kaiqing Zhang -
2022 Poster: On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning »
Weichao Mao · Lin Yang · Kaiqing Zhang · Tamer Basar -
2022 Poster: Do Differentiable Simulators Give Better Policy Gradients? »
Hyung Ju Suh · Max Simchowitz · Kaiqing Zhang · Russ Tedrake -
2022 Spotlight: On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning »
Weichao Mao · Lin Yang · Kaiqing Zhang · Tamer Basar -
2022 Oral: Do Differentiable Simulators Give Better Policy Gradients? »
Hyung Ju Suh · Max Simchowitz · Kaiqing Zhang · Russ Tedrake -
2022 Poster: Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence »
Dongsheng Ding · Chen-Yu Wei · Kaiqing Zhang · Mihailo Jovanovic -
2022 Oral: Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence »
Dongsheng Ding · Chen-Yu Wei · Kaiqing Zhang · Mihailo Jovanovic -
2021 Poster: Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs »
Weichao Mao · Kaiqing Zhang · Ruihao Zhu · David Simchi-Levi · Tamer Basar -
2021 Poster: Train simultaneously, generalize better: Stability of gradient-based minimax learners »
Farzan Farnia · Asuman Ozdaglar -
2021 Spotlight: Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs »
Weichao Mao · Kaiqing Zhang · Ruihao Zhu · David Simchi-Levi · Tamer Basar -
2021 Spotlight: Train simultaneously, generalize better: Stability of gradient-based minimax learners »
Farzan Farnia · Asuman Ozdaglar -
2021 Poster: A Wasserstein Minimax Framework for Mixed Linear Regression »
Theo Diamandis · Yonina Eldar · Alireza Fallah · Farzan Farnia · Asuman Ozdaglar -
2021 Oral: A Wasserstein Minimax Framework for Mixed Linear Regression »
Theo Diamandis · Yonina Eldar · Alireza Fallah · Farzan Farnia · Asuman Ozdaglar -
2021 Poster: Reinforcement Learning for Cost-Aware Markov Decision Processes »
Wesley A Suttle · Kaiqing Zhang · Zhuoran Yang · Ji Liu · David N Kraemer -
2021 Spotlight: Reinforcement Learning for Cost-Aware Markov Decision Processes »
Wesley A Suttle · Kaiqing Zhang · Zhuoran Yang · Ji Liu · David N Kraemer -
2020 Poster: Do GANs always have Nash equilibria? »
Farzan Farnia · Asuman Ozdaglar -
2018 Poster: Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents »
Kaiqing Zhang · Zhuoran Yang · Han Liu · Tong Zhang · Tamer Basar -
2018 Oral: Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents »
Kaiqing Zhang · Zhuoran Yang · Han Liu · Tong Zhang · Tamer Basar