Skip to yearly menu bar Skip to main content


Poster

Langevin Policy for Safe Reinforcement Learning

Fenghao Lei · Long Yang · Shiting Wen · Zhixiong Huang · Zhiwang Zhang · Chaoyi Pang

Hall C 4-9 #1104
[ ] [ Paper PDF ]
Thu 25 Jul 4:30 a.m. PDT — 6 a.m. PDT

Abstract:

Optimization and sampling based algorithms are two branches of methods in machine learning, while existing safe reinforcement learning (RL) algorithms are mainly based on optimization, it is still unclear whether sampling based methods can lead to desirable performance with safe policy. This paper formulates the Langevin policy for safe RL, and proposes Langevin Actor-Critic (LAC) to accelerate the process of policy inference. Concretely, instead of parametric policy, the proposed Langevin policy provides a stochastic process that directly infers actions, which is the numerical solver to the Langevin dynamic of actions on the continuous time. Furthermore, to make Langevin policy practical on RL tasks, the proposed LAC accumulates the transitions induced by Langevin policy and reproduces them with a generator. Finally, extensive empirical results show the effectiveness and superiority of LAC on the MuJoCo-based and Safety Gym tasks.

Chat is not available.