Learning Anisotropic Value Geometry with Finsler Reinforcement Learning
Jumman Hossain ⋅ Nirmalya Roy
Abstract
We introduce **Finslerian Reinforcement Learning (FiRL)**, an RL framework that makes directional costs explicit and improves robustness to tail risk. FiRL incorporates a *Finsler metric* into the locomotion cost, expressing effort as $F(x,v)$ that depends on the state $x$ and motion $v$, so it can capture uphill versus downhill asymmetry, lateral slip, and other direction-dependent effects. To handle rare but catastrophic outcomes, FiRL optimizes a Conditional Value-at-Risk objective. We derive the corresponding risk-sensitive Bellman equation and show that the resulting CVaR–Finsler Bellman operator is a $\gamma$-contraction. This guarantees a unique fixed-point value function, which induces a *quasi-metric* structure that satisfies a triangle inequality despite directional asymmetry. We then develop a FiRL actor–critic algorithm to learn policies under this anisotropic, risk-averse objective. Across MuJoCo and Isaac Sim locomotion benchmarks, FiRL consistently learns safer and more energy-efficient behaviors than strong baselines such as risk-neutral PPO. For instance, on a $12^\circ$ sloped Hopper task, FiRL reduces worst-case impact forces by over 35% and total energy cost by 15%, while also improving success rate.
Successful Page Load