Trajectory-Stabilized Inference for Diffusion-Based Video Inpainting
Abstract
Video inpainting aims to restore missing regions while preserving spatial and temporal coherence. Diffusion-based methods achieve strong per-frame reconstruction, but their sampling implicitly generates temporally coupled latent trajectories whose long-horizon stability is not explicitly modeled, leading to a trade-off between temporal consistency and structural detail. We revisit video inpainting from the perspective of temporal trajectory stability, viewing temporal inconsistency as instability along time-indexed denoising trajectories rather than an output-level error. Based on this view, we propose an inference-time trajectory stabilization framework that monitors motion-aligned deviation and triggers risk-aware correction only when instability accumulates. It combines sparsely sampled trajectory anchors as stability references with neighborhood-consistent propagation to regulate trajectory evolution while preserving local generative freedom. Implemented as a lightweight control layer in the sampling loop, it selectively contracts unstable trajectories toward motion-consistent manifolds instead of enforcing uniform temporal constraints. Experiments show consistent improvements in temporal coherence and structural fidelity.