Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them
Woojung Han ⋅ Seil Kang ⋅ Youngjun Jun ⋅ Min-Hung Chen ⋅ Fu-En Yang ⋅ Seong Jae Hwang
Abstract
Video diffusion models can generate visually stunning content, yet frequently produce motion that violates physical laws, objects accelerate implausibly or vanish mid-trajectory. We reveal a surprising finding: a 2-step generation often exhibits better physical consistency than a 50-step output from the same model. Through spectral analysis, we trace this to phase erosion during denoising, motion dynamics are $8.5\times$ more sensitive to phase corruption than magnitude, yet the refinement process progressively destroys this critical component. Building on this insight, we propose PhaseLock, a training-free framework that locks motion dynamics to fast inference priors. Rather than requiring 50 steps to establish physics, PhaseLock extracts a motion prior from just 2 steps and enforces it onto high-fidelity generation via Latent Delta Guidance. This decouples physical consistency from visual refinement, ensuring the final output remains grounded in valid trajectories. PhaseLock achieves strong physical consistency with negligible overhead ($1.06\times$ time, $1.02\times$ memory), eliminating the need for expensive external guidance methods ($\sim5\times$ time).
Successful Page Load