Poster Mon, Jul 6, 2026 • 6:30 PM – 8:15 PM PDT HALL A #1509

Temporal Straightening for Latent Planning

Ying Wang ⋅ Oumayma Bounou ⋅ Gaoyue Zhou ⋅ Randall Balestriero ⋅ Tim G. J. Rudner ⋅ Yann LeCun ⋅ Mengye Ren

Project Page

Abstract

Learning good representations is essential for latent planning with world models. While pretrained visual encoders produce strong semantic visual features, they are not tailored to planning and contain information irrelevant---or even detrimental---to planning. Inspired by the perceptual straightening hypothesis in human visual processing, we introduce temporal straightening to improve representation learning for latent planning. Using a curvature regularizer that encourages locally straightened latent trajectories, we jointly learn an encoder and a predictor of a Joint-Embedding Predictive Architecture (JEPA) world model. We show that reducing curvature this way makes the Euclidean distance in latent space a better proxy for the geodesic distance and improves the conditioning of the planning objective. We demonstrate empirically that temporal straightening makes gradient-based planning more stable and yields significantly higher success rates across a suite of goal-reaching tasks. Our code is in https://agenticlearning.ai/temporal-straightening.

Lay Summary

A world model learns to predict how the world will change given the current state and action, then uses those predictions for planning. However, in many latent world models, the learned representation is not naturally organized for planning and control: trajectories that are feasible in the real environment can become highly curved in latent space, making prediction and planning hard. Inspired by temporal straightening in neuroscience, where the human visual system is hypothesised to transform natural videos into straighter internal trajectories, we introduce a geometric regularizer for JEPA world models. During training, the model learns to predict future latent states, while we encourage latent trajectories to have lower curvature. This produces straighter representations where latent Euclidean distances are more meaningful and planning objectives become easier to optimize. This simple idea significantly improves goal-reaching planning tasks. Our results suggest that good world-model representations should not only encode the world, but also make future states easier to predict and plan through.