Action Manifold Smoothing: A Lipschitz Pathway Perspective on High-Dimensional Reinforcement Learning
Zhihao Lin
Abstract
High-dimensional continuous control remains challenging in deep reinforcement learning, where algorithms like TD3 and SAC often collapse. We propose a unifying \textbf{Lipschitz Pathway} framework that decomposes instability into four amplification stages, namely action parameterization ($L_1$), dynamics sensitivity ($L_2$), Q-network curvature ($L_3$), and temporal-difference (TD) target stability ($L_4$), where errors compound multiplicatively along the learning pipeline. Our analysis identifies a \textit{discrete-continuous mismatch} as the root cause: value functions trained from sparse point samples must generalize over continuous manifolds, leading to multiplicative error amplification along the pathway. To address this, we introduce \textbf{Action Manifold Smoothing (AMS)}, which replaces point-wise TD targets with orthogonally-sampled neighborhood averages, jointly regularizing $L_3$ (via implicit Laplacian smoothing) and $L_4$ (via local manifold supervision). We further characterize when Lipschitz-constrained Q-networks and geometric action priors are beneficial based on task structure. Empirically, AMS enables both TD3 and SAC to achieve over 400 reward on the 38-D Dog Run task within 1M steps, where baselines fail. These results validate the Lipschitz pathway as a principled framework for diagnosing and solving stability bottlenecks in high-dimensional control.
Successful Page Load