Skip to yearly menu bar Skip to main content

Workshop: Decision Awareness in Reinforcement Learning

SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition

Dylan Slack · Yinlam Chow · Bo Dai · Nevan Wichers


Though many reinforcement learning (RL) problems involve learning policies in settings with difficult-to-specify safety constraints and sparse rewards, current methods struggle to acquire successful and safe policies. Methods that extract useful policy primitives from offline datasets using generative modeling have recently shown promise at accelerating RL in these more complex settings. However, we discover that current primitive-learning methods may not be well-equipped for safe policy learning and may promote usafe behavior due to their tendency to ignore data from undesirable behaviors. To improve the safety of offline skill learning algorithms, we propose SAFEty skill pRiors, an algorithm that accelerates policy learning on complex control tasks under safety constraints. Through principled training on an offline dataset, SAFER learns to extract safe primitive skills. In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies. We theoretically characterize why SAFER can enforce safe policy learning and demonstrate its effectiveness on several complex safety-critical robotic grasping tasks inspired by the game Operation, in which SAFER outperforms baseline methods in learning successful policies and enforcing safety.

Chat is not available.