Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Consistency Models Made Easy

About

Consistency models (CMs) offer faster sampling than traditional diffusion models, but their training is resource-intensive. For example, as of 2024, training a state-of-the-art CM on CIFAR-10 takes one week on 8 GPUs. In this work, we propose an effective scheme for training CMs that largely improves the efficiency of building such models. Specifically, by expressing CM trajectories via a particular differential equation, we argue that diffusion models can be viewed as a special case of CMs. We can thus fine-tune a consistency model starting from a pretrained diffusion model and progressively approximate the full consistency condition to stronger degrees over the training process. Our resulting method, which we term Easy Consistency Tuning (ECT), achieves vastly reduced training times while improving upon the quality of previous methods: for example, ECT achieves a 2-step FID of 2.73 on CIFAR10 within 1 hour on a single A100 GPU, matching Consistency Distillation trained for hundreds of GPU hours. Owing to this computational efficiency, we investigate the scaling laws of CMs under ECT, showing that they obey the classic power law scaling, hinting at their ability to improve efficiency and performance at larger scales. Our code (https://github.com/locuslab/ect) is publicly available, making CMs more accessible to the broader community.

Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, J. Zico Kolter• 2024

Related benchmarks

TaskDatasetResultRank
Unconditional Image GenerationCIFAR-10 (test)
FID2.11
216
Unconditional Image GenerationCIFAR-10
FID3.6
171
Unconditional Image GenerationCIFAR-10 unconditional
FID2.11
159
Image GenerationImageNet 64x64 resolution (test)
FID1.67
150
Class-conditional Image GenerationImageNet 64x64
FID4.05
126
Unconditional GenerationCIFAR-10 (test)
FID3.6
102
Unconditional Image GenerationCIFAR-10 32x32 (test)
FID3.6
94
Class-conditional Image GenerationImageNet 64x64 (test)
FID1.5
86
Class-conditional Image GenerationImageNet 512x512 (train)
FID3.38
52
Image GenerationCIFAR-10 unconditional (test)
FID2.11
39
Showing 10 of 13 rows

Other info

Follow for update