Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Next Generation of AI Safety

Safer Reinforcement Learning by Going Off-policy: a Benchmark

Igor Kuznetsov

Keywords: [ safe reinforcement learning ] [ Off-policy ] [ Continuous Control ]


Abstract:

Avoiding penalizing safety constraints while learning solvable tasks is the main concern of Safe Reinforcement Learning (SafeRL). Most prior studies focus on solving SafeRL problems with the on-policy algorithms, which obtain stable results at the expense of sample efficiency. In this paper, we study SafeRL from the off-policy perspective. We argue that off-policy RL algorithms are better suited for SafeRL as minimizing the number of samples results in fewer safety penalties. We show that off-policy algorithms achieve better safety metrics for the same performance level than on-policy competitors and provide a benchmark of 6 modern off-policy algorithms tested on 30 environments from the state-of-the-art SafetyGymnasium environment set.

Chat is not available.