Flow Equivariant World Models: Structured Memory for Dynamic Environments
Abstract
The natural world is richly structured over space and time. Much of this structure arises from the interplay between spatial geometry and motion. However, most existing world models ignore this structure, leading to an inability to generalize in dynamic environments. In this work, we show that enforcing equivariance between an agent's representations and the world's dynamics necessarily induces an efficient, structured memory. Concretely, we introduce Flow Equivariant World Modeling, a framework in which both self-motion and external object motion are unified as one-parameter Lie-group ``flows'' acting on a latent world memory; and models are built to be equivariant with respect to these transformations. On 2D and 3D partially observed video world modeling benchmarks, we demonstrate that Flow Equivariant World Models significantly outperform comparable state-of-the-art diffusion-based and memory-augmented world modeling architectures in their ability to track and predict the locations of moving objects over long horizons. Project page: https://anonflowm.github.io/