Poster
in
Workshop: Challenges in Deployable Generative AI
Understanding Data Replication in Diffusion Models
Gowthami Somepalli · Vasu Singla · Micah Goldblum · Jonas Geiping · Tom Goldstein
Keywords: [ memorization ]
Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user. In this paper, we first analyze this memorization problem in text-to-image diffusion models. Contrary to the prevailing belief attributing content replication solely to duplicated images in the training set, our findings highlight the equally significant role of text conditioning in this phenomenon. Specifically, we observe that the combination of image and caption duplication contributes to the memorization of training data, while the sole duplication of images either fails to contribute or even diminishes the occurrence of memorization in the examined cases.