Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning
Jingduo Pan ⋅ Taoran Wu ⋅ Yiling Xue ⋅ Bai Xue
Abstract
We study stochastic minimum-cost reach-avoid reinforcement learning, where an agent must satisfy a reach-avoid specification with probability at least $p$ while minimizing expected cumulative costs in stochastic environments. Existing safe and constrained reinforcement learning methods typically fail to jointly provide probabilistic reach-avoid guarantees and cost-optimality in the learning setting in stochastic environments. In order to overcome this issue, we introduce reach-avoid probability certificates (RAPCs), which characterize the states from which stochastic reach-avoid constraints are satisfiable. Based on RAPCs, we develop a contraction-based Bellman formulation that enables reinforcement learning to optimize cumulative cost while provably satisfying stochastic reach-avoid constraints. We establish almost sure convergence of the proposed algorithms to locally optimal policies under the reach-avoid constraints. Experiments in the MuJoCo simulator demonstrate improved cost performance and higher reach-avoid satisfaction compared to existing baselines.
Successful Page Load