Timezone: »
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a policy is trained to maximize a learned value function while constrained by the behavior policy to avoid a significant distributional shift. In this paper, we propose our closed-form policy improvement operators. We make a novel observation that the behavior constraint naturally motivates the use of first-order Taylor approximation, leading to a linear approximation of the policy objective. Additionally, as practical datasets are usually collected by heterogeneous policies, we model the behavior policies as a Gaussian Mixture and overcome the induced optimization difficulties by leveraging the LogSumExp's lower bound and Jensen's Inequality, giving rise to a closed-form policy improvement operator. We instantiate both one-step and iterative offline RL algorithms with our novel policy improvement operators and empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark. Our code is available at https://cfpi-icml23.github.io/.
Author Information
Jiachen Li (University of California, Santa Barbara)
Edwin Zhang (Harvard)
Ming Yin (UCSB/Princeton)
Jerry Bai (Horizon Robotics)
Yu-Xiang Wang (UC Santa Barbara / Amazon)
William Wang (UCSB)
More from the Same Authors
-
2021 : Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings »
Ming Yin · Yu-Xiang Wang -
2021 : Near-Optimal Offline Reinforcement Learning via Double Variance Reduction »
Ming Yin · Yu Bai · Yu-Xiang Wang -
2022 : Causal Balancing for Domain Generalization »
Xinyi Wang · Michael Saxon · Jiachen Li · Hongyang Zhang · Kun Zhang · William Wang -
2022 : Optimal Dynamic Regret in LQR Control »
Dheeraj Baby · Yu-Xiang Wang -
2023 : A Privacy-Friendly Approach to Data Valuation »
Jiachen Wang · Yuqing Zhu · Yu-Xiang Wang · Ruoxi Jia · Prateek Mittal -
2023 : Reasoning Ability Emerges in Large Language Models as Aggregation of Reasoning Paths »
Xinyi Wang · William Wang -
2023 : Reasoning Ability Emerges in Large Language Models as Aggregation of Reasoning Paths »
Xinyi Wang · William Wang -
2023 : Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning »
Xinyi Wang · Wanrong Zhu · Michael Saxon · Mark Steyvers · William Wang -
2023 : Why Quantization Improves Generalization: NTK of Binary Weight Neural Network »
Kaiqi Zhang · Ming Yin · Yu-Xiang Wang -
2023 : Generative Autoencoders as Watermark Attackers: Analyses of Vulnerabilities and Threats »
Xuandong Zhao · Kexun Zhang · Yu-Xiang Wang · Lei Li -
2023 : Provable Robust Watermarking for AI-Generated Text »
Xuandong Zhao · Prabhanjan Ananth · Lei Li · Yu-Xiang Wang -
2023 Poster: Protecting Language Generation Models via Invisible Watermarking »
Xuandong Zhao · Yu-Xiang Wang · Lei Li -
2023 Poster: Differentially Private Optimization on Large Model at Small Cost »
Zhiqi Bu · Yu-Xiang Wang · Sheng Zha · George Karypis -
2023 Poster: Non-stationary Reinforcement Learning under General Function Approximation »
Songtao Feng · Ming Yin · Ruiquan Huang · Yu-Xiang Wang · Jing Yang · Yingbin LIANG -
2023 Poster: Global Optimization with Parametric Function Approximation »
Chong Liu · Yu-Xiang Wang -
2023 Poster: ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval »
Kexun Zhang · Xianjun Yang · William Wang · Lei Li -
2022 Poster: Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost »
Dan Qiao · Ming Yin · Ming Min · Yu-Xiang Wang -
2022 Spotlight: Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost »
Dan Qiao · Ming Yin · Ming Min · Yu-Xiang Wang -
2022 Poster: Distributionally Robust $Q$-Learning »
Zijian Liu · Jerry Bai · Jose Blanchet · Perry Dong · Wei Xu · Zhengqing Zhou · Zhengyuan Zhou -
2022 Spotlight: Distributionally Robust $Q$-Learning »
Zijian Liu · Jerry Bai · Jose Blanchet · Perry Dong · Wei Xu · Zhengqing Zhou · Zhengyuan Zhou -
2021 Poster: Generative Particle Variational Inference via Estimation of Functional Gradients »
Neale Ratzlaff · Jerry Bai · Fuxin Li · Wei Xu -
2021 Spotlight: Generative Particle Variational Inference via Estimation of Functional Gradients »
Neale Ratzlaff · Jerry Bai · Fuxin Li · Wei Xu -
2020 Poster: Implicit Generative Modeling for Efficient Exploration »
Neale Ratzlaff · Qinxun Bai · Fuxin Li · Wei Xu