Breaking Manifold Continuity: Vector Quantized Modeling for Real-Centric Deepfake Detection
Abstract
The increasingly realistic and diverse generative data has led some deepfake detection methods to shift towards learning robust real content, \textit{e.g.}, via reconstruction-based tasks. However, most existing approaches rely primarily on prevalent continuous modeling (\textit{e.g.}, GMMs, VAEs, Diffusion Models) to construct a continuous latent manifold of real data, with the aim of improving the generalization capability, while overlooking a critical issue, \textit{i.e.}, such continuity may facilitate the interpolation of forgery artifacts, consequently causing ambiguity in detection. To alleviate this problem, we integrate discrete modeling into the feature space of the CLIP vision encoder, striking a balance between continuous manifold modeling and discrete representation. By incorporating a learnable vector quantized codebook, the real latent manifold is discretized, imposing a more stringent information bottleneck that reduces the likelihood of embedding generative artifacts. In order to further enhance the generalization of discrete modeling, we propose an adaptive tangent space projection mechanism that yields a continuous relaxation of the discrete real distribution within a controllable range. With these components, our method constructs a real distribution that is both tightly constrained and broadly generalizable, enhancing robustness to unseen forgeries. Extensive experiments on diverse datasets demonstrate the effectiveness of our method.