Multistep Distillation of Diffusion Models via Moment Matching
About
We present a new method for making diffusion models faster to sample. The method distills many-step diffusion models into few-step models by matching conditional expectations of the clean data given noisy data along the sampling trajectory. Our approach extends recently proposed one-step methods to the multi-step case, and provides a new perspective by interpreting these approaches in terms of moment matching. By using up to 8 sampling steps, we obtain distilled models that outperform not only their one-step versions but also their original many-step teacher models, obtaining new state-of-the-art results on the Imagenet dataset. We also show promising results on a large text-to-image model where we achieve fast generation of high resolution images directly in image space, without needing autoencoders or upsamplers.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Generation | ImageNet 64x64 resolution (test) | FID3 | 150 | |
| Class-conditional Image Generation | ImageNet 64x64 | FID1.24 | 126 | |
| Class-conditional Image Generation | ImageNet 64x64 (train test) | FID1.24 | 30 | |
| Class-conditional Image Generation | ImageNet 128x128 | FID1.49 | 27 | |
| Text-to-Image Generation | MS-COCO 512x512 zero-shot | -- | 19 |