Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

About

Consistency models (CMs) are a powerful class of diffusion-based generative models optimized for fast sampling. Most existing CMs are trained using discretized timesteps, which introduce additional hyperparameters and are prone to discretization errors. While continuous-time formulations can mitigate these issues, their success has been limited by training instability. To address this, we propose a simplified theoretical framework that unifies previous parameterizations of diffusion models and CMs, identifying the root causes of instability. Based on this analysis, we introduce key improvements in diffusion process parameterization, network architecture, and training objectives. These changes enable us to train continuous-time CMs at an unprecedented scale, reaching 1.5B parameters on ImageNet 512x512. Our proposed training algorithm, using only two sampling steps, achieves FID scores of 2.06 on CIFAR-10, 1.48 on ImageNet 64x64, and 1.88 on ImageNet 512x512, narrowing the gap in FID scores with the best existing diffusion models to within 10%.

Cheng Lu, Yang Song• 2024

Related benchmarks

TaskDatasetResultRank
Image GenerationImageNet 512x512 (val)
FID-50K1.88
184
Unconditional Image GenerationCIFAR-10
FID2.97
171
Class-conditional Image GenerationImageNet 256x256 (test)
FID1.94
167
Unconditional Image GenerationCIFAR-10 unconditional
FID2.06
159
Unconditional GenerationCIFAR-10 (test)
FID2.97
102
Unconditional Image GenerationCIFAR-10 32x32 (test)
FID2.97
94
Class-conditional Image GenerationImageNet 64x64 (test)
FID1.48
86
Class-conditional Image GenerationImageNet 512x512 (train)
FID1.88
52
Image GenerationCIFAR-10 unconditional (test)
FID2.97
39
Image GenerationImageNet 512x512
FID4.29
34
Showing 10 of 19 rows

Other info

Follow for update