One-step Diffusion with Distribution Matching Distillation
About
Diffusion models generate high-quality images but require dozens of forward passes. We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator with minimal impact on image quality. We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence whose gradient can be expressed as the difference between 2 score functions, one of the target distribution and the other of the synthetic distribution being produced by our one-step generator. The score functions are parameterized as two diffusion models trained separately on each distribution. Combined with a simple regression loss matching the large-scale structure of the multi-step diffusion outputs, our method outperforms all published few-step diffusion approaches, reaching 2.62 FID on ImageNet 64x64 and 11.49 FID on zero-shot COCO-30k, comparable to Stable Diffusion but orders of magnitude faster. Utilizing FP16 inference, our model generates images at 20 FPS on modern hardware.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Image Generation | GenEval | GenEval Score59 | 277 | |
| Unconditional Image Generation | CIFAR-10 (test) | FID2.62 | 216 | |
| Unconditional Image Generation | CIFAR-10 | FID3.77 | 171 | |
| Class-conditional Image Generation | ImageNet 256x256 (test) | FID1.71 | 167 | |
| Unconditional Image Generation | CIFAR-10 unconditional | FID3.77 | 159 | |
| Image Generation | ImageNet 64x64 resolution (test) | FID2.62 | 150 | |
| Text-to-Image Generation | MS-COCO 2014 (val) | -- | 128 | |
| Class-conditional Image Generation | ImageNet 64x64 | FID2.62 | 126 | |
| Image Generation | ImageNet 64x64 | FID2.62 | 114 | |
| Unconditional Generation | CIFAR-10 (test) | FID2.66 | 102 |