Adversarial Diffusion Distillation
About
We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps while maintaining high image quality. We use score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal in combination with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps. Our analyses show that our model clearly outperforms existing few-step methods (GANs, Latent Consistency Models) in a single step and reaches the performance of state-of-the-art diffusion models (SDXL) in only four steps. ADD is the first method to unlock single-step, real-time image synthesis with foundation models. Code and weights available under https://github.com/Stability-AI/generative-models and https://huggingface.co/stabilityai/ .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Image Generation | GenEval | Overall Score55 | 467 | |
| Text-to-Image Generation | GenEval | GenEval Score54 | 277 | |
| Text-to-Image Generation | T2I-CompBench (test) | Color Accuracy61.49 | 67 | |
| Text-to-Image Generation | GenEval 1.0 (test) | Overall Score47.66 | 63 | |
| Text-to-Image Generation | MS COCO zero-shot | FID16.25 | 42 | |
| Text-to-Image Generation | HPSv2 | HPSv2 Score29.93 | 35 | |
| Text-to-Image Generation | OneIG-Bench | Alignment0.791 | 33 | |
| Text-to-Image Generation | COCO 30k | FID23.19 | 29 | |
| Text-to-Image Generation | COCO 2014 (val) | Precision65 | 25 | |
| Text-to-Image Generation | MS-COCO 10K prompts 2014 (val) | FID26.7 | 19 |