Timezone: »

Towards Robust and Safe Reinforcement Learning with Benign Off-policy Data
Zuxin Liu · Zijian Guo · Zhepeng Cen · Huan Zhang · Yihang Yao · Hanjiang Hu · Ding Zhao

Wed Jul 26 02:00 PM -- 03:30 PM (PDT) @ Exhibit Hall 1 #720

Previous work demonstrates that the optimal safe reinforcement learning policy in a noise-free environment is vulnerable and could be unsafe under observational attacks. While adversarial training effectively improves robustness and safety, collecting samples by attacking the behavior agent online could be expensive or prohibitively dangerous in many applications. We propose the robuSt vAriational ofF-policy lEaRning (SAFER) approach, which only requires benign training data without attacking the agent. SAFER obtains an optimal non-parametric variational policy distribution via convex optimization and then uses it to improve the parameterized policy robustly via supervised learning. The two-stage policy optimization facilitates robust training, and extensive experiments on multiple robot platforms show the efficiency of SAFER in learning a robust and safe policy: achieving the same reward with much fewer constraint violations during training than on-policy baselines.

Author Information

Zuxin Liu (Carnegie Mellon University)
Zijian Guo (Carnegie Mellon University)
Zhepeng Cen (Carnegie Mellon University)
Huan Zhang (UIUC)
Yihang Yao (Carnegie Mellon University)
Hanjiang Hu (Carnegie Mellon University)
Ding Zhao (Carnegie Mellon University)

More from the Same Authors