The Journey, Not the Destination: How Data Guides Diffusion Models
Kristian Georgiev · Joshua Vendrow · Hadi Salman · Sung Min (Sam) Park · Aleksander Madry
Keywords:
data attribution
Diffusion Models
memorization
Privacy
Data Valuation
influence estimation
Abstract
Diffusion-based generative models can synthesize photo-realistic images of unprecedented quality and diversity. However, attributing these images back to the training data---that is, identifying specific training examples which caused the images to be generated---remains challenging. In this paper, we propose a framework that: i) formalizes data attribution in the context of diffusion models, and ii) provides a method for computing attributions efficiently. By applying our framework to CIFAR-10 and MS COCO, we uncover visually compelling attributions, which we validate through counterfactual analysis.
Video
Chat is not available.
Successful Page Load