Path-Coupled Bellman Flows for Distributional Reinforcement Learning
Boyang Xu ⋅ Qing Zou ⋅ Siqin Yang ⋅ Hao Yan
Abstract
Distributional RL models the full return distribution, but common categorical/quantile approaches rely on projection and independently sampled Bellman targets, which ignore the Bellman operator’s affine transport structure and yield high-variance learning signals. We introduce Path-Coupled Bellman Flows, a flow-matching framework that shares base noise to couple the generative trajectories of consecutive states, inducing a geometric Bellman scaling law between their velocity fields. This geometry motivates a $\lambda$-family of Bellman-flow objectives that functions as a control variate, reducing variance while retaining the same Bellman-consistent fixed point. Across toy diagnostics and offline RL benchmarks (OGBench, D4RL), our method improves training stability and achieves competitive or improved performance relative to prior distributional baselines.
Successful Page Load