The Journey, Not the Destination: How Data Guides Diffusion Models

About

Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity. However, attributing these images back to the training data-that is, identifying specific training examples which caused an image to be generated-remains a challenge. In this paper, we propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions. Then, we provide a method for computing these attributions efficiently. Finally, we apply our method to find (and evaluate) such attributions for denoising diffusion probabilistic models trained on CIFAR-10 and latent diffusion models trained on MS COCO. We provide code at https://github.com/MadryLab/journey-TRAK .

Kristian Georgiev, Joshua Vendrow, Hadi Salman, Sung Min Park, Aleksander Madry• 2023

Related benchmarks

Task	Dataset	Result
Contributor Attribution	Fashion Product	Diversity13.58	48
Contributor Attribution	ArtBench Post-Impressionism	Aesthetic Score-11.94	36
Contributor Attribution	CIFAR-20	Inception Score10.8	32
Contributor Attribution	ArtBench Post-Impressionism (test)	Aesthetic Score-4.81	18
Contributor Attribution	CIFAR-20 (test)	Inception Score-1.67	16
Influence Estimation	MNIST	Detection Rate (Top 5)25.6	4

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord