CORE: Conflict-Oriented Reasoning for General Multimodal Manipulation Detection
Jinjie Shen ⋅ Yaxiong Wang ⋅ Yujiao Wu ⋅ Lechao Cheng ⋅ Tianrui Hui ⋅ Nan Pu ⋅ Zhihui Li ⋅ Zhun Zhong
Abstract
The rapid rise of generative AI has made multimodal fake news increasingly realistic and pervasive, posing severe threats to public trust and social stability. Existing detection methods rely heavily on manipulation-specific models and large-scale labeled data, resulting in poor generalization to emerging manipulation types. We observed that the essence of manipulated misinformation lies in its intrinsic conflicts, i.e., semantic or physical inconsistencies either across modalities or with common world knowledge. Inspired by this observation, we propose Conflict-{Oriented REasoning (CORE) framework, an effective paradigm that learns to endows multimodal large language models (MLLMs) with explicit conflict-capturing capability. To this end, CORE first constructs the Conflict Attribution Corpus (CAC) with fine-grained annotations of conflict factors and sources, providing essential data support for subsequent conflict perception training. By performing conflict-oriented representation enhancement and reasoning based on CAC, CORE achieves robust and generalizable conflict detection, effectively and rapidly adapting to unseen manipulation types with a few samples or in even zero-shot settings. Extensive experiments demonstrate that CORE surpasses state-of-the-art models by 9.7\%, 14.1\%, and 11.8\% in accuracy on the DGM$^4$, MMFakeBench, and MDSM benchmarks, respectively.
Successful Page Load