Poster

SpeedVFI: One-step Diffusion for Efficient Video Frame Interpolation

Ganggui Ding ⋅ Xiaogang Xu ⋅ Hao Chen ⋅ Chunhua Shen

Abstract

Generative video diffusion models have shown strong robustness to large motion and occlusions for video frame interpolation (VFI). However, their inference efficiency lags significantly behind learning-based methods due to the structural redundancy of pairwise inference and the procedural latency of multi-step iterative denoising. To address these limitations, we propose SpeedVFI, a one-step diffusion framework that achieves dual efficiency improvements by interpolating the entire video sequence in a single forward pass to eliminate pairwise overhead, and distilling the generation trajectory into a one-step denoising process to bypass iterative latency. To support this high-efficiency architecture, we introduce temporal RoPE alignment to ensure temporal consistency across the unified sequence, and noise-centric partial attention to reduce computational overhead while preserving global context. Extensive experiments demonstrate that SpeedVFI accelerates diffusion-based VFI by orders of magnitude while maintaining competitive quantitative and visual quality.