Beyond Pixels: Mining Compressed Domain Artifacts for Efficient AI-Generated Video Detection
Abstract
With the rapid advancement of high-fidelity video generation models, robust AI-generated video (AIGV) detection has become increasingly needed. While most AIGV detection methods operate in the decoded pixel domain, we observe that detection in the pixel domain inevitably entangles task-irrelevant semantic information, leading to substantial semantic redundancy and extensive redundant computation, while overlooking free-to-use signals in compressed bitstreams. In particular, motion vectors and residuals directly encode temporal and spatial generative artifacts but remain largely underexplored. To address these issues, we propose a unified framework for Spatio-Temporal REsidual and Artifact Mining, namely STREAM, which enables AIGV detection directly from compressed bitstreams. STREAM leverages I-frames, motion vectors, and residual errors to capture spatiotemporal artifacts that are typically smoothed out by decompression filters. In particular, we design a lightweight network with a motion-guided alignment module and a gated fusion mechanism, enabling adaptive fusion of spatial artifacts and nonlinear temporal dynamics. Extensive experimental results demonstrate that STREAM achieves SOTA performance with an mAP of 0.965, with 2.5× faster inference than previous SOTA baselines.