Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Your Diffusion Model is Secretly a Zero-Shot Classifier

About

The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. These models can generate realistic images for a staggering variety of prompts and exhibit impressive compositional generalization abilities. Almost all use cases thus far have solely focused on sampling; however, diffusion models can also provide conditional density estimates, which are useful for tasks beyond image generation. In this paper, we show that the density estimates from large-scale text-to-image diffusion models like Stable Diffusion can be leveraged to perform zero-shot classification without any additional training. Our generative approach to classification, which we call Diffusion Classifier, attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models. Although a gap remains between generative and discriminative approaches on zero-shot recognition tasks, our diffusion-based approach has significantly stronger multimodal compositional reasoning ability than competing discriminative approaches. Finally, we use Diffusion Classifier to extract standard classifiers from class-conditional diffusion models trained on ImageNet. Our models achieve strong classification performance using only weak augmentations and exhibit qualitatively better "effective robustness" to distribution shift. Overall, our results are a step toward using generative over discriminative models for downstream tasks. Results and visualizations at https://diffusion-classifier.github.io/

Alexander C. Li, Mihir Prabhudesai, Shivam Duggal, Ellis Brown, Deepak Pathak• 2023

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet A
Top-1 Acc16.2
654
Image ClassificationImageNet V2
Top-1 Acc80.6
611
Image ClassificationFood-101
Accuracy77.7
542
Image ClassificationImageNet-R
Top-1 Acc38.3
529
Image ClassificationCIFAR-10
Accuracy88.5
508
Image ClassificationCIFAR-10
Accuracy88.5
507
Image ClassificationImageNet-Sketch
Top-1 Accuracy53.7
407
Referring Expression ComprehensionRefCOCO+ (val)
Accuracy24.07
354
Referring Expression ComprehensionRefCOCO (val)
Accuracy23.83
344
Referring Expression ComprehensionRefCOCO (testA)
Accuracy0.2155
342
Showing 10 of 47 rows

Other info

Code

Follow for update