Temporal Straightening for Latent Planning
Abstract
Learning good representations is essential for latent planning with world models. While pretrained visual encoders provide strong visual features, they are not tailored to planning and contain substantial information which is irrelevant to planning. Inspired by the perceptual straightening hypothesis in human visual processing, we introduce temporal straightening for representation learning in latent planning. We add a lightweight projector on top of a pretrained visual encoder to map to a lower-dimensional space, trained with a curvature regularizer that encourages locally straightened latent trajectories. We show that reducing curvature improves the conditioning of the planning objective, making gradient-based planning more stable and yielding significantly higher success rates across four goal-reaching tasks.