Skip to yearly menu bar Skip to main content


Poster

TERD: A Unified Framework for Backdoor Defense on Diffusion Model

Yichuan Mo · Hui Huang · Mingjie Li · Ang Li · Yisen Wang


Abstract:

While the diffusion models have achieved notable success in image generation, they remain highly vulnerable to backdoor attacks, compromising their integrity by producing specific undesirable outputs when presenting a pre-defined trigger. In this paper, we investigate how to protect diffusion models from this dangerous threat. Specifically, we propose TERD, a backdoor defense framework that employs a trigger reversion strategy, executed in two sequential steps: an initial approximation of the trigger through replacement with a known distribution, followed by a refinement process utilizing differential multi-step generations. Moreover, given the reversed trigger, we not only propose the first backdoor input detection approach for diffusion models but also a novel model detection algorithm by calculating the KL divergence between the reversed and benign distributions. Extensive evaluations demonstrate that TERD secures a 100\% True Positive Rate (TPR) and True Negative Rate (TNR) across datasets of varying resolutions and showcases adaptability to other Stochastic Differential Equation (SDE)-based models.

Live content is unavailable. Log in and register to view live content