Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Diffusion Models Are Innate One-Step Generators

About

Diffusion Models (DMs) have achieved great success in image generation and other fields. By fine sampling through the trajectory defined by the SDE/ODE solver based on a well-trained score model, DMs can generate remarkable high-quality results. However, this precise sampling often requires multiple steps and is computationally demanding. To address this problem, instance-based distillation methods have been proposed to distill a one-step generator from a DM by having a simpler student model mimic a more complex teacher model. Yet, our research reveals an inherent limitations in these methods: the teacher model, with more steps and more parameters, occupies different local minima compared to the student model, leading to suboptimal performance when the student model attempts to replicate the teacher. To avoid this problem, we introduce a novel distributional distillation method, which uses an exclusive distributional loss. This method exceeds state-of-the-art (SOTA) results while requiring significantly fewer training images. Additionally, we show that DMs' layers are differentially activated at different time steps, leading to an inherent capability to generate images in a single step. Freezing most of the convolutional layers in a DM during distributional distillation enables this innate capability and leads to further performance improvements. Our method achieves the SOTA results on CIFAR-10 (FID 1.54), AFHQv2 64x64 (FID 1.23), FFHQ 64x64 (FID 0.85) and ImageNet 64x64 (FID 1.16) with great efficiency. Most of those results are obtained with only 5 million training images within 6 hours on 8 A100 GPUs.

Bowen Zheng, Tianming Yang• 2024

Related benchmarks

TaskDatasetResultRank
Unconditional Image GenerationCIFAR-10 (test)
FID1.54
216
Unconditional Image GenerationCIFAR-10 unconditional
FID1.54
159
Class-conditional Image GenerationImageNet 64x64 (test)
FID1.16
86
Conditional Image GenerationCIFAR-10
FID1.44
71
Conditional Image GenerationCIFAR10 (test)
Fréchet Inception Distance1.44
66
Unconditional Image GenerationAFHQ 64x64 v2 (test)
FID1.23
13
Unconditional Image GenerationFFHQ 64x64 (test)
FID0.85
10
Showing 7 of 7 rows

Other info

Follow for update