PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher
About
The diffusion model performs remarkable in generating high-dimensional content but is computationally intensive, especially during training. We propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a novel pipeline that reduces the training costs through three stages: training diffusion on downsampled data, distilling the pretrained diffusion, and progressive super-resolution. With the proposed pipeline, PaGoDA achieves a $64\times$ reduced cost in training its diffusion model on 8x downsampled data; while at the inference, with the single-step, it performs state-of-the-art on ImageNet across all resolutions from 64x64 to 512x512, and text-to-image. PaGoDA's pipeline can be applied directly in the latent space, adding compression alongside the pre-trained autoencoder in Latent Diffusion Models (e.g., Stable Diffusion). The code is available at https://github.com/sony/pagoda.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Class-conditional Image Generation | ImageNet 256x256 | Inception Score (IS)259.6 | 441 | |
| Image Generation | ImageNet 256x256 (val) | FID1.56 | 307 | |
| Image Generation | ImageNet 512x512 (val) | FID-50K1.8 | 184 | |
| Image Generation | ImageNet 64x64 (train val) | FID1.21 | 83 | |
| Image Generation | ImageNet 128x128 (val) | FID1.48 | 15 |