ICML Poster QPRL : Learning Optimal Policies with Quasi-Potential Functions for Asymmetric Traversal

Poster

QPRL : Learning Optimal Policies with Quasi-Potential Functions for Asymmetric Traversal

Jumman Hossain · Nirmalya Roy

West Exhibition Hall B2-B3 #W-700

[ Abstract ] [ Lay Summary ] [ Project Page ]

[ Poster] [ OpenReview]

Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: Reinforcement learning (RL) in real-world tasks such as robotic navigation often encounters environments with asymmetric traversal costs, where actions like climbing uphill versus moving downhill incur distinctly different penalties, or transitions may become irreversible. While recent quasimetric RL methods relax symmetry assumptions, they typically do not explicitly account for path-dependent costs or provide rigorous safety guarantees. We introduce Quasi-Potential Reinforcement Learning (QPRL), a novel framework that explicitly decomposes asymmetric traversal costs into a path-independent potential function ($\Phi$) and a path-dependent residual ($\Psi$). This decomposition allows efficient learning and stable policy optimization via a Lyapunov-based safety mechanism. Theoretically, we prove that QPRL achieves convergence with improved sample complexity of $\tilde{O}(\sqrt{T})$, surpassing prior quasimetric RL bounds of $\tilde{O}(T)$. Empirically, our experiments demonstrate that QPRL attains state-of-the-art performance across various navigation and control tasks, significantly reducing irreversible constraint violations by approximately $4\times$ compared to baselines.

Lay Summary:

Many real-world tasks—like a delivery robot climbing steep streets or a rescue drone squeezing through one-way passages—cost much more effort in one direction than the other, and some actions cannot be undone at all. Today’s learning algorithms usually ignore this imbalance, so they may choose routes that waste energy or put the robot in a trap. Our research introduces Quasi-Potential Reinforcement Learning (QPRL), a method that lets an agent recognize and plan around these one-way or “uneven-effort” situations. QPRL splits the cost of every move into two parts: a reusable “potential” map (showing, for example, how steep a hill is) and an extra penalty for truly irreversible steps (such as dropping off a ledge). A built-in safety check ensures the agent never takes a step whose future cost could exceed a small, user-set limit. Across several simulated navigation and control tasks, QPRL learns faster, reaches goals more reliably, and commits about four times fewer irreversible errors than existing approaches. These ideas could help future robots and self-driving vehicles to act more safely and efficiently whenever undoing a decision is hard or impossible.

Chat is not available.