Timezone: »

Towards Safe Reinforcement Learning via Constraining Conditional Value at Risk
Chengyang Ying · Xinning Zhou · Dong Yan · Jun Zhu

Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty caused by stochastic policies and environment variability. To address this issue, we propose a novel reinforcement learning framework of CVaR-Proximal-Policy-Optimization (CPPO) by rating the conditional value-at-risk (CVaR) as an assessment for risk. We show that performance degradation under observation state disturbance and transition probability disturbance theoretically depends on the range of disturbance as well as the gap of value function between different states. Therefore, constraining the value function among states with CVaR can improve the robustness of the policy. Experimental results show that CPPO achieves higher cumulative reward and exhibits stronger robustness against observation state disturbance and transition probability disturbance in environment dynamics among a series of continuous control tasks in MuJoCo.

Author Information

Chengyang Ying (Tsinghua University, Tsinghua University)
Xinning Zhou (Tsinghua University)
Dong Yan (Tsinghua University)
Jun Zhu (Tsinghua University)

More from the Same Authors