Timezone: »
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee
Tengyu Xu · Yingbin LIANG · Guanghui Lan
In safe reinforcement learning (SRL) problems, an agent explores the environment to maximize an expected total reward and meanwhile avoids violation of certain constraints on a number of expected total costs. In general, such SRL problems have nonconvex objective functions subject to multiple nonconvex constraints, and hence are very challenging to solve, particularly to provide a globally optimal policy. Many popular SRL algorithms adopt a primal-dual structure which utilizes the updating of dual variables for satisfying the constraints. In contrast, we propose a primal approach, called constraint-rectified policy optimization (CRPO), which updates the policy alternatingly between objective improvement and constraint satisfaction. CRPO provides a primal-type algorithmic framework to solve SRL problems, where each policy update can take any variant of policy optimization step. To demonstrate the theoretical performance of CRPO, we adopt natural policy gradient (NPG) for each policy update step and show that CRPO achieves an $\mathcal{O}(1/\sqrt{T})$ convergence rate to the global optimal policy in the constrained policy set and an $\mathcal{O}(1/\sqrt{T})$ error bound on constraint satisfaction. This is the first finite-time analysis of primal SRL algorithms with global optimality guarantee. Our empirical results demonstrate that CRPO can outperform the existing primal-dual baseline algorithms significantly.
Author Information
Tengyu Xu (The Ohio State University)
Yingbin LIANG (The Ohio State University)
Guanghui Lan (Georgia Institute of Technology)
More from the Same Authors
-
2021 : CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee »
Tengyu Xu · Yingbin LIANG · Guanghui Lan -
2023 Poster: Generalized-Smooth Nonconvex Optimization is As Efficient As Smooth Nonconvex Optimization »
Ziyi Chen · Yi Zhou · Yingbin LIANG · Zhaosong Lu -
2023 Poster: Theory on Forgetting and Generalization of Continual Learning »
Sen Lin · Peizhong Ju · Yingbin LIANG · Ness Shroff -
2023 Poster: Non-stationary Reinforcement Learning under General Function Approximation »
Songtao Feng · Ming Yin · Ruiquan Huang · Yu-Xiang Wang · Jing Yang · Yingbin LIANG -
2023 Poster: A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints »
Ming Shi · Yingbin LIANG · Ness Shroff -
2021 Poster: Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin LIANG -
2021 Poster: CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee »
Tengyu Xu · Yingbin LIANG · Guanghui Lan -
2021 Spotlight: Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin LIANG -
2021 Spotlight: CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee »
Tengyu Xu · Yingbin LIANG · Guanghui Lan -
2021 Poster: Bilevel Optimization: Convergence Analysis and Enhanced Design »
Kaiyi Ji · Junjie Yang · Yingbin LIANG -
2021 Spotlight: Bilevel Optimization: Convergence Analysis and Enhanced Design »
Kaiyi Ji · Junjie Yang · Yingbin LIANG -
2020 Poster: History-Gradient Aided Batch Size Adaptation for Variance Reduced Algorithms »
Kaiyi Ji · Zhe Wang · Bowen Weng · Yi Zhou · Wei Zhang · Yingbin LIANG -
2019 Poster: Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization »
Kaiyi Ji · Zhe Wang · Yi Zhou · Yingbin LIANG -
2019 Oral: Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization »
Kaiyi Ji · Zhe Wang · Yi Zhou · Yingbin LIANG -
2017 Poster: Conditional Accelerated Lazy Stochastic Gradient Descent »
Guanghui · Sebastian Pokutta · Yi Zhou · Daniel Zink -
2017 Talk: Conditional Accelerated Lazy Stochastic Gradient Descent »
Guanghui · Sebastian Pokutta · Yi Zhou · Daniel Zink