Timezone: »

Understanding Data Replication in Diffusion Models
Gowthami Somepalli · Vasu Singla · Micah Goldblum · Jonas Geiping · Tom Goldstein
Event URL: https://openreview.net/forum?id=F9qCNPSzSY »

Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user. In this paper, we first analyze this memorization problem in text-to-image diffusion models. Contrary to the prevailing belief attributing content replication solely to duplicated images in the training set, our findings highlight the equally significant role of text conditioning in this phenomenon. Specifically, we observe that the combination of image and caption duplication contributes to the memorization of training data, while the sole duplication of images either fails to contribute or even diminishes the occurrence of memorization in the examined cases.

Author Information

Gowthami Somepalli (University of Maryland, College Park)
Vasu Singla (University of Maryland)
Micah Goldblum (New York University)
Jonas Geiping (University of Maryland, College Park)
Tom Goldstein (University of Maryland)

More from the Same Authors