Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Challenges in Deployable Generative AI

Understanding Data Replication in Diffusion Models

Gowthami Somepalli · Vasu Singla · Micah Goldblum · Jonas Geiping · Tom Goldstein

Keywords: [ memorization ]


Abstract:

Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user. In this paper, we first analyze this memorization problem in text-to-image diffusion models. Contrary to the prevailing belief attributing content replication solely to duplicated images in the training set, our findings highlight the equally significant role of text conditioning in this phenomenon. Specifically, we observe that the combination of image and caption duplication contributes to the memorization of training data, while the sole duplication of images either fails to contribute or even diminishes the occurrence of memorization in the examined cases.

Chat is not available.