Forensic Prompting with Dual-Action Policy Optimization for Vision-Language Forgery Detection and Localization
Abstract
Image forgery is rapidly evolving, rendering forensic traces increasingly subtle and readily attenuated by post-processing. Although vision--language prompting can inject priors, open-ended LLM-generated prompts are difficult to constrain, and naive language description can introduce semantic perturbations. To address these challenges, we propose Forensic Prompting with Dual-Action policy optimization (FPDA) for vision-language forgery detection and localization, where Forensic Prompting Module (FPM) constructs a structured forensic prompt bank and supports optional text input to provide lightweight stable conditioning via a reliability gate. Moreover, a Dual-Action Policy Optimization (DAPO) is applied to adaptively route prompts and schedule refinement strategies on a per-image basis, stabilizing discriminative cues and improving mask spatial consistency. Extensive experiments are conducted on multiple public datasets of manipulations, diffusion content, face forgeries, and text-enabled settings (e.g., CASIA/NIST16/Coverage, CocoGlide, OpenForensics and SIDSetdescription), which demonstrates superior detection and localization performance over state-of-the-art methods.