Timezone: »
Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying them to safety-critical applications.Previous primal-dual style approaches suffer from instability issues and lack optimality guarantees. This paper overcomes the issues from the perspective of probabilistic inference. We introduce a novel Expectation-Maximization approach to naturally incorporate constraints during the policy learning: 1) a provable optimal non-parametric variational distribution could be computed in closed form after a convex optimization (E-step); 2) the policy parameter is improved within the trust region based on the optimal variational distribution (M-step).The proposed algorithm decomposes the safe RL problem into a convex optimization phase and a supervised learning phase, which yields a more stable training performance.A wide range of experiments on continuous robotic tasks shows that the proposed method achieves significantly better constraint satisfaction performance and better sample efficiency than baselines.The code is available at https://github.com/liuzuxin/cvpo-safe-rl.
Author Information
Zuxin Liu (Carnegie Mellon University)
Zhepeng Cen (Carnegie Mellon University)
Vladislav Isenbaev (Nuro Inc.)
Wei Liu (Nuro Inc.)
Steven Wu (Carnegie Mellon University)
Bo Li (UIUC)

Dr. Bo Li is an assistant professor in the Department of Computer Science at the University of Illinois at Urbana–Champaign. She is the recipient of the IJCAI Computers and Thought Award, Alfred P. Sloan Research Fellowship, AI’s 10 to Watch, NSF CAREER Award, MIT Technology Review TR-35 Award, Dean's Award for Excellence in Research, C.W. Gear Outstanding Junior Faculty Award, Intel Rising Star award, Symantec Research Labs Fellowship, Rising Star Award, Research Awards from Tech companies such as Amazon, Facebook, Intel, IBM, and eBay, and best paper awards at several top machine learning and security conferences. Her research focuses on both theoretical and practical aspects of trustworthy machine learning, which is at the intersection of machine learning, security, privacy, and game theory. She has designed several scalable frameworks for trustworthy machine learning and privacy-preserving data publishing. Her work has been featured by major publications and media outlets such as Nature, Wired, Fortune, and New York Times.
Ding Zhao (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Constrained Variational Policy Optimization for Safe Reinforcement Learning »
Thu. Jul 21st 06:55 -- 07:00 PM Room Room 327 - 329
More from the Same Authors
-
2021 : Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
· Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2021 : Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses »
Keegan Harris · Dung Ngo · Logan Stapleton · Hoda Heidari · Steven Wu -
2021 : Stateful Strategic Regression »
Keegan Harris · Hoda Heidari · Steven Wu -
2021 : Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods »
Terrance Liu · Giuseppe Vietri · Steven Wu -
2021 : Private Multi-Task Learning: Formulation and Applications to Federated Learning »
Shengyuan Hu · Steven Wu · Virginia Smith -
2021 : Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods »
Terrance Liu · Giuseppe Vietri · Steven Wu -
2021 : Understanding Clipped FedAvg: Convergence and Client-Level Differential Privacy »
xinwei zhang · Xiangyi Chen · Steven Wu · Mingyi Hong -
2021 : Improved Privacy Filters and Odometers: Time-Uniform Bounds in Privacy Composition »
Justin Whitehouse · Aaditya Ramdas · Ryan Rogers · Steven Wu -
2021 : Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses »
Keegan Harris · Dung Ngo · Logan Stapleton · Hoda Heidari · Steven Wu -
2021 : Stateful Strategic Regression »
Keegan Harris · Hoda Heidari · Steven Wu -
2021 : Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2021 : Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses »
Keegan Harris · Dung Ngo · Logan Stapleton · Hoda Heidari · Steven Wu -
2021 : Scalable Algorithms for Nonlinear Causal Inference »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2021 : Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2022 : Meta-Learning Adversarial Bandits »
Nina Balcan · Keegan Harris · Mikhail Khodak · Steven Wu -
2022 : Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables »
Mengdi Xu · Peide Huang · Visak Kumar · Jielin Qiu · Chao Fang · Kuan-Hui Lee · Xuewei Qi · Henry Lam · Bo Li · Ding Zhao -
2022 : Paper 22: Multimodal Unsupervised Car Segmentation via Adaptive Aerial Image-to-Image Translation »
Haohong Lin · Zhepeng Cen · Peide Huang · Hanjiang Hu -
2022 : Paper 2: SeasonDepth: Cross-Season Monocular Depth Prediction Dataset and Benchmark under Multiple Environments »
Ding Zhao · Hitesh Arora · Jiacheng Zhu · Zuxin Liu · Wenhao Ding -
2022 : Paper 10: CausalAF: Causal Autoregressive Flow for Safety-Critical Scenes Generation »
Wenhao Ding · Haohong Lin · Bo Li · Ding Zhao · Hitesh Arora -
2023 : DiffScene: Diffusion-Based Safety-Critical Scenario Generation for Autonomous Vehicles »
Chejian Xu · Ding Zhao · Alberto Sngiovanni Vincentelli · Bo Li -
2023 : Complementing a Policy with a Different Observation Space »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2023 : Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation »
Wenhao Ding · Laixi Shi · Yuejie Chi · Ding Zhao -
2023 : Learning from Sparse Offline Datasets via Conservative Density Estimation »
Zhepeng Cen · Zuxin Liu · Zitong Wang · Yihang Yao · Henry Lam · Ding Zhao -
2023 : Adaptive Principal Component Regression with Applications to Panel Data »
Anish Agarwal · Keegan Harris · Justin Whitehouse · Steven Wu -
2023 : Strategyproof Decision-Making in Panel Data Settings and Beyond »
Keegan Harris · Anish Agarwal · Chara Podimata · Steven Wu -
2023 : Offline Reinforcement Learning with Imbalanced Datasets »
Li Jiang · Sijie Cheng · Jielin Qiu · Victor Chan · Ding Zhao -
2023 : Semantically Adversarial Scene Generation with Explicit Knowledge Guidance for Autonomous Driving »
Wenhao Ding · Haohong Lin · Bo Li · Ding Zhao -
2023 : Can Public Large Language Models Help Private Cross-device Federated Learning? »
Boxin Wang · Yibo J. Zhang · Yuan Cao · Bo Li · Hugh B McMahan · Sewoong Oh · Zheng Xu · Manzil Zaheer -
2023 : Can Public Large Language Models Help Private Cross-device Federated Learning? »
Boxin Wang · Yibo J. Zhang · Yuan Cao · Bo Li · Hugh B McMahan · Sewoong Oh · Zheng Xu · Manzil Zaheer -
2023 : Strategic Apple Tasting »
Keegan Harris · Chara Podimata · Steven Wu -
2023 : Strategyproof Decision-Making in Panel Data Settings and Beyond »
Keegan Harris · Anish Agarwal · Chara Podimata · Steven Wu -
2023 : Complementing a Policy with a Different Observation Space »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2023 : Learning Shared Safety Constraints from Multi-task Demonstrations »
Konwoo Kim · Gokul Swamy · Zuxin Liu · Ding Zhao · Sanjiban Choudhury · Steven Wu -
2023 : Strategic Apple Tasting »
Keegan Harris · Chara Podimata · Steven Wu -
2023 : Visual-based Policy Learning with Latent Language Encoding »
Jielin Qiu · Mengdi Xu · William Han · Bo Li · Ding Zhao -
2023 : Can Brain Signals Reveal Inner Alignment with Human Languages? »
Jielin Qiu · William Han · Jiacheng Zhu · Mengdi Xu · Douglas Weber · Bo Li · Ding Zhao -
2023 : Multimodal Representation Learning of Cardiovascular Magnetic Resonance Imaging »
Jielin Qiu · Peide Huang · Makiya Nakashima · Jaehyun Lee · Jiacheng Zhu · Wilson Tang · Pohao Chen · Christopher Nguyen · Byung-Hak Kim · Debbie Kwon · Douglas Weber · Ding Zhao · David Chen -
2023 : Robustness Verification for Perception Models against Camera Motion Perturbations »
Hanjiang Hu · Changliu Liu · Ding Zhao -
2023 : Learning Shared Safety Constraints from Multi-task Demonstrations »
Konwoo Kim · Gokul Swamy · Zuxin Liu · Ding Zhao · Sanjiban Choudhury · Steven Wu -
2023 Workshop: Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities »
Zheng Xu · Peter Kairouz · Bo Li · Tian Li · John Nguyen · Jianyu Wang · Shiqiang Wang · Ayfer Ozgur -
2023 Workshop: Knowledge and Logical Reasoning in the Era of Data-driven Learning »
Nezihe Merve Gürel · Bo Li · Theodoros Rekatsinas · Beliz Gunel · Alberto Sngiovanni Vincentelli · Paroma Varma -
2023 Poster: Constrained Decision Transformer for Offline Safe Reinforcement Learning »
Zuxin Liu · Zijian Guo · Yihang Yao · Zhepeng Cen · Wenhao Yu · Tingnan Zhang · Ding Zhao -
2023 Poster: UMD: Unsupervised Model Detection for X2X Backdoor Attacks »
Zhen Xiang · Zidi Xiong · Bo Li -
2023 Poster: Fully-Adaptive Composition in Differential Privacy »
Justin Whitehouse · Aaditya Ramdas · Ryan Rogers · Steven Wu -
2023 Oral: Nonparametric Extensions of Randomized Response for Private Confidence Sets »
Ian Waudby-Smith · Steven Wu · Aaditya Ramdas -
2023 Poster: Towards Robust and Safe Reinforcement Learning with Benign Off-policy Data »
Zuxin Liu · Zijian Guo · Zhepeng Cen · Huan Zhang · Yihang Yao · Hanjiang Hu · Ding Zhao -
2023 Poster: Nonparametric Extensions of Randomized Response for Private Confidence Sets »
Ian Waudby-Smith · Steven Wu · Aaditya Ramdas -
2023 Poster: Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models »
Wenhao Ding · Tong Che · Ding Zhao · Marco Pavone -
2023 Poster: Interpolation for Robust Learning: Data Augmentation on Wasserstein Geodesics »
Jiacheng Zhu · Jielin Qiu · Aritra Guha · Zhuolin Yang · XuanLong Nguyen · Bo Li · Ding Zhao -
2023 Poster: Inverse Reinforcement Learning without Reinforcement Learning »
Gokul Swamy · David Wu · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2023 Poster: Generating Private Synthetic Data with Genetic Algorithms »
Terrance Liu · Jingwu Tang · Giuseppe Vietri · Steven Wu -
2023 Poster: Reconstructive Neuron Pruning for Backdoor Defense »
Yige Li · XIXIANG LYU · Xingjun Ma · Nodens Koren · Lingjuan Lyu · Bo Li · Yu-Gang Jiang -
2022 : Paper 15: On the Robustness of Safe Reinforcement Learning under Observational Perturbations »
Zuxin Liu · Zhepeng Cen · Huan Zhang · Jie Tan · Bo Li · Ding Zhao -
2022 : Paper 16: Constrained Model-based Reinforcement Learning via Robust Planning »
Zuxin Liu · Ding Zhao -
2022 Poster: Information Discrepancy in Strategic Learning »
Yahav Bechavod · Chara Podimata · Steven Wu · Juba Ziani -
2022 Poster: Provable Domain Generalization via Invariant-Feature Subspace Recovery »
Haoxiang Wang · Haozhe Si · Bo Li · Han Zhao -
2022 Poster: Causal Imitation Learning under Temporally Correlated Noise »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2022 Spotlight: Information Discrepancy in Strategic Learning »
Yahav Bechavod · Chara Podimata · Steven Wu · Juba Ziani -
2022 Spotlight: Provable Domain Generalization via Invariant-Feature Subspace Recovery »
Haoxiang Wang · Haozhe Si · Bo Li · Han Zhao -
2022 Oral: Causal Imitation Learning under Temporally Correlated Noise »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2022 Poster: Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses »
Keegan Harris · Dung Ngo · Logan Stapleton · Hoda Heidari · Steven Wu -
2022 Poster: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 Poster: Improved Regret for Differentially Private Exploration in Linear MDP »
Dung Ngo · Giuseppe Vietri · Steven Wu -
2022 Poster: Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy »
xinwei zhang · Xiangyi Chen · Mingyi Hong · Steven Wu · Jinfeng Yi -
2022 Poster: How to Steer Your Adversary: Targeted and Efficient Model Stealing Defenses with Gradient Redirection »
Mantas Mazeika · Bo Li · David Forsyth -
2022 Poster: Adversarially Robust Models may not Transfer Better: Sufficient Conditions for Domain Transferability from the View of Regularization »
Xiaojun Xu · Yibo Zhang · Evelyn Ma · Hyun Ho Son · Sanmi Koyejo · Bo Li -
2022 Poster: Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond »
Haoxiang Wang · Bo Li · Han Zhao -
2022 Spotlight: How to Steer Your Adversary: Targeted and Efficient Model Stealing Defenses with Gradient Redirection »
Mantas Mazeika · Bo Li · David Forsyth -
2022 Spotlight: Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy »
xinwei zhang · Xiangyi Chen · Mingyi Hong · Steven Wu · Jinfeng Yi -
2022 Spotlight: Improved Regret for Differentially Private Exploration in Linear MDP »
Dung Ngo · Giuseppe Vietri · Steven Wu -
2022 Spotlight: Adversarially Robust Models may not Transfer Better: Sufficient Conditions for Domain Transferability from the View of Regularization »
Xiaojun Xu · Yibo Zhang · Evelyn Ma · Hyun Ho Son · Sanmi Koyejo · Bo Li -
2022 Spotlight: Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond »
Haoxiang Wang · Bo Li · Han Zhao -
2022 Spotlight: Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses »
Keegan Harris · Dung Ngo · Logan Stapleton · Hoda Heidari · Steven Wu -
2022 Spotlight: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 Poster: Certifying Out-of-Domain Generalization for Blackbox Functions »
Maurice Weber · Linyi Li · Boxin Wang · Zhikuan Zhao · Bo Li · Ce Zhang -
2022 Poster: Double Sampling Randomized Smoothing »
Linyi Li · Jiawei Zhang · Tao Xie · Bo Li -
2022 Poster: TPC: Transformation-Specific Smoothing for Point Cloud Models »
Wenda Chu · Linyi Li · Bo Li -
2022 Spotlight: TPC: Transformation-Specific Smoothing for Point Cloud Models »
Wenda Chu · Linyi Li · Bo Li -
2022 Spotlight: Double Sampling Randomized Smoothing »
Linyi Li · Jiawei Zhang · Tao Xie · Bo Li -
2022 Spotlight: Certifying Out-of-Domain Generalization for Blackbox Functions »
Maurice Weber · Linyi Li · Boxin Wang · Zhikuan Zhao · Bo Li · Ce Zhang -
2021 : Discussion Panel #2 »
Bo Li · Nicholas Carlini · Andrzej Banburski · Kamalika Chaudhuri · Will Xiao · Cihang Xie -
2021 Workshop: A Blessing in Disguise: The Prospects and Perils of Adversarial Machine Learning »
Hang Su · Yinpeng Dong · Tianyu Pang · Eric Wong · Zico Kolter · Shuo Feng · Bo Li · Henry Liu · Dan Hendrycks · Francesco Croce · Leslie Rice · Tian Tian -
2021 Poster: Uncovering the Connections Between Adversarial Transferability and Knowledge Transferability »
Kaizhao Liang · Yibo Zhang · Boxin Wang · Zhuolin Yang · Sanmi Koyejo · Bo Li -
2021 Poster: CRFL: Certifiably Robust Federated Learning against Backdoor Attacks »
Chulin Xie · Minghao Chen · Pin-Yu Chen · Bo Li -
2021 Poster: Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation »
Jiawei Zhang · Linyi Li · Huichen Li · Xiaolu Zhang · Shuang Yang · Bo Li -
2021 Poster: Leveraging Public Data for Practical Private Query Release »
Terrance Liu · Giuseppe Vietri · Thomas Steinke · Jonathan Ullman · Steven Wu -
2021 Poster: Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation »
Haoxiang Wang · Han Zhao · Bo Li -
2021 Spotlight: Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation »
Jiawei Zhang · Linyi Li · Huichen Li · Xiaolu Zhang · Shuang Yang · Bo Li -
2021 Spotlight: Uncovering the Connections Between Adversarial Transferability and Knowledge Transferability »
Kaizhao Liang · Yibo Zhang · Boxin Wang · Zhuolin Yang · Sanmi Koyejo · Bo Li -
2021 Spotlight: Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation »
Haoxiang Wang · Han Zhao · Bo Li -
2021 Spotlight: Leveraging Public Data for Practical Private Query Release »
Terrance Liu · Giuseppe Vietri · Thomas Steinke · Jonathan Ullman · Steven Wu -
2021 Spotlight: CRFL: Certifiably Robust Federated Learning against Backdoor Attacks »
Chulin Xie · Minghao Chen · Pin-Yu Chen · Bo Li -
2021 Poster: Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks »
Nezihe Merve Gürel · Xiangyu Qi · Luka Rimanic · Ce Zhang · Bo Li -
2021 Spotlight: Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks »
Nezihe Merve Gürel · Xiangyu Qi · Luka Rimanic · Ce Zhang · Bo Li -
2021 Poster: Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2021 Spotlight: Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2021 Poster: Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2021 Poster: Incentivizing Compliance with Algorithmic Instruments »
Dung Ngo · Logan Stapleton · Vasilis Syrgkanis · Steven Wu -
2021 Spotlight: Incentivizing Compliance with Algorithmic Instruments »
Dung Ngo · Logan Stapleton · Vasilis Syrgkanis · Steven Wu -
2021 Spotlight: Towards the Unification and Robustness of Perturbation and Gradient Based Explanations »
Sushant Agarwal · Shahin Jabbari · Chirag Agarwal · Sohini Upadhyay · Steven Wu · Hima Lakkaraju -
2020 Poster: Improving Robustness of Deep-Learning-Based Image Reconstruction »
Ankit Raj · Yoram Bresler · Bo Li