Poster
in
Workshop: 2nd Workshop on Generative AI and Law (GenLaw ’24)
Diffusion Unlearning Optimization for Robust and Safe Text-to-Image Models
Yong-Hyun Park · Sangdoo Yun · Jin-Hwa Kim · Junho Kim · Geonhui Jang · Yonghyun Jeong · Junghyo Jo · Gayoung Lee
Recently, as the performance of text-to-image models has significantly improved, there have been many concerns about their negative social impact. To solve this problem, existing methods conducted prompt-based unlearning that remove unwanted concepts from the model while preserving model performance on non-target concept. However, recent studies show that these methods are vulnerable to adversarial prompt attacks. In this paper, we propose the method that unlearns visual features instead of prompt-dependent parameters. Specifically, we apply Direct Preference Optimization (DPO) method to guide the model to prefer generating the paired ground truth images over the images containing unsafe concepts. We show that our method is robust against adversarial prompt attacks, which existing prompt-based unlearning methods are vulnerable to.