From Internal Diagnosis to External Auditing: A VLM-Driven Paradigm for Data-Free Online Backdoor Defense
Abstract
Deep Neural Networks (DNNs) remain fundamentally vulnerable to backdoor attacks. Traditional data-free defenses largely operate under the paradigm of internal diagnosis methods like model repairing or input robustness, yet these approaches are often fragile under advanced attacks as they remain entangled with the victim model’s corrupted parameters. We propose a paradigm shift to data-free External Semantic Auditing, using universal Vision-Language Models (VLMs) as independent auditors to decouple defense from the compromised model. We introduce PRISM (Prototype Refinement & Inspection via Statistical Monitoring), which transforms generic VLMs into domain-adaptive gatekeepers purely via online test-time adaptation. PRISM bridges the domain gap through a Hybrid VLM Teacher that refines prototypes from the test stream and an Adaptive Router that calibrates thresholds via statistical monitoring. Evaluation across 17 datasets and 11 attack types confirms PRISM achieves state-of-the-art performance (suppressing Attack Success Rate to < 1% on CIFAR-10), proving that robust defense is achievable without touching the model weights or accessing a single training sample.