Motion Dynamics Learning for Few-Shot Embodied Adaptation
Abstract
Vision-Language-Action (VLA) models have shown strong potential for robotic manipulation, yet adapting pretrained models to novel tasks typically relies on substantial task-specific demonstrations, limiting scalability. Current VLA methods mostly focus on action imitation, which ignores the richer structure contained in trajectories. In contrast, motion dynamics governing how actions evolve over time are more informative and transferable, making them better suited for few-shot adaptation. Motivated by this idea, we propose DynVLA, a few-shot adaptation system that reformulates VLA learning from action imitation to trajectory-level motion dynamics modeling. Specifically, we propose Motion Dynamics Mechanism (MDM), which distills latent physical regimes from trajectories via flow-matching inversion, yielding compact representations that capture dynamics. We further design Dynamics-Constrained Modeling (DCM). DCM projects these inferred representations onto a Dynamics Bank, which stores prior motion knowledge pretrained from diverse demonstrations. By grounding action generation in these learned priors, the system enables interpolating between existing action paradigms to represent novel dynamics modes. Experiments on 13 real-world tasks demonstrate that DynVLA outperforms existing SOTA systems by 19\% in average success rate with only 10-20 demonstrations, highlighting its adaptation capabilities in real-world scenes.