Dynamics Are Learned, Not Told: Semi-Supervised Discovery of Latent Dynamics Geometries For Zero-Shot Policy Adaptation
Abstract
Real-world dynamics shifts pose a critical challenge for reinforcement learning, yet prior methods typically rely on encoding explicitly identified physical parameters into a latent context, a rigid parameterization that proves brittle to unmodeled or compound dynamics variations. We instead investigate dynamics adaptation through the lens of latent geometry, and show theoretically that target-domain regret is controlled by the Lipschitz smoothness of a trajectory dynamics encoder. We further prove that this Lipschitz constant can be upper-bounded through optimizing a multi-positive InfoNCE objective, yielding a smooth, task-relevant latent topology without privileged dynamics information. On MuJoCo benchmarks, our method significantly outperforms explicit identification baselines under severe dynamics shifts, including unmodeled structural failures, while simultaneously improving in-distribution stability and latent interpretability. Overall, these results validate that controlling latent smoothness is a principled and scalable mechanism for robust adaptation.