Multistep Distillation of Diffusion Models via Moment Matching

About

We present a new method for making diffusion models faster to sample. The method distills many-step diffusion models into few-step models by matching conditional expectations of the clean data given noisy data along the sampling trajectory. Our approach extends recently proposed one-step methods to the multi-step case, and provides a new perspective by interpreting these approaches in terms of moment matching. By using up to 8 sampling steps, we obtain distilled models that outperform not only their one-step versions but also their original many-step teacher models, obtaining new state-of-the-art results on the Imagenet dataset. We also show promising results on a large text-to-image model where we achieve fast generation of high resolution images directly in image space, without needing autoencoders or upsamplers.

Tim Salimans, Thomas Mensink, Jonathan Heek, Emiel Hoogeboom• 2024

Related benchmarks

Task	Dataset	Result
Class-conditional Image Generation	ImageNet 64x64	FID1.24	170
Class-conditional Image Generation	ImageNet 128x128	FID1.49	155
Image Generation	ImageNet 64x64 resolution (test)	FID3	150
Class-conditional Image Generation	ImageNet 64x64 (train test)	FID1.24	30
Text-to-Image Generation	MS-COCO 512x512 zero-shot	--	19
Image Generation	ImageNet 64x64 (train test)	FID1.24	17
Image Generation	ImageNet 128x128 (train test)	TFLOPs0.27	13

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord