Lightning Unified Video Editing via In-Context Sparse Attention
Abstract
Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first experimentally lossless sparse framework tailored for ICL video editing. Our design is grounded in two key insights: first, context tokens exhibit significantly lower saliency than source tokens; second, we theoretically prove and empirically validate that Query sharpness correlates with approximation error. Motivated by these findings, ISA implements an efficient pre-selection strategy to prune redundant context, followed by a dynamic query grouping mechanism that routes high-error queries to full attention and low-error ones to a computationally efficient 0-th order Taylor sparse attention. Furthermore, we construct a scalable pipeline to curate a 1M-sample dataset and train LIVEditor, a novel lightning video editing model via ISA. Extensive experiments demonstrate that LIVEditor achieves a ~60% reduction in latency while surpassing state-of-the-art methods across EditVerseBench, IVE-Bench, and VIE-Bench, delivering experimentally lossless acceleration without compromising visual fidelity.