Real Data Lies: Unveiling and Closing the Quality Shortcut in Generalizable AI-Generated Video Detection
Abstract
Recent advances in video generation have enabled highly realistic synthetic content, raising concerns about the integrity of digital media and motivating the development of benchmarks and detection methods for generated videos. Prior works have largely prioritized bolstering model generalization against unseen generators. However, we uncover a neglected factor: the quality distribution of real videos plays a pivotal role. Current training protocols suffer from a clear quality bias between real and fake data, prone to shortcut learning. Compounded by testing on similar real data distributions, this creates an illusion of generalization. In reality, these models fail to generalize when exposed to real data with significantly different quality profiles. To address this, we propose training with quality-matched real and fake data to mitigate bias. Building on this, we introduce a data expansion strategy that broadens the training set to comprehensively cover the full quality spectrum. This approach enables the model to learn quality-agnostic features for detection, thereby achieving generalization across real data of varying qualities and enhancing real-world applicability. Extensive experiments demonstrate that our method scales well across diverse backbones, consistently enhancing the generalization capability of existing models.