Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics

About

Diffusion probabilistic models (DPMs) have exhibited excellent performance for high-fidelity image generation while suffering from inefficient sampling. Recent works accelerate the sampling procedure by proposing fast ODE solvers that leverage the specific ODE form of DPMs. However, they highly rely on specific parameterization during inference (such as noise/data prediction), which might not be the optimal choice. In this work, we propose a novel formulation towards the optimal parameterization during sampling that minimizes the first-order discretization error of the ODE solution. Based on such formulation, we propose DPM-Solver-v3, a new fast ODE solver for DPMs by introducing several coefficients efficiently computed on the pretrained model, which we call empirical model statistics. We further incorporate multistep methods and a predictor-corrector framework, and propose some techniques for improving sample quality at small numbers of function evaluations (NFE) or large guidance scales. Experiments show that DPM-Solver-v3 achieves consistently better or comparable performance in both unconditional and conditional sampling with both pixel-space and latent-space DPMs, especially in 5$\sim$10 NFEs. We achieve FIDs of 12.21 (5 NFE), 2.51 (10 NFE) on unconditional CIFAR10, and MSE of 0.55 (5 NFE, 7.5 guidance scale) on Stable Diffusion, bringing a speed-up of 15%$\sim$30% compared to previous state-of-the-art training-free methods. Code is available at https://github.com/thu-ml/DPM-Solver-v3.

Kaiwen Zheng, Cheng Lu, Jianfei Chen, Jun Zhu• 2023

Related benchmarks

TaskDatasetResultRank
Image GenerationCIFAR-10 (test)
FID7.83
483
Image GenerationImageNet 256 10k samples
FID7.39
165
Unconditional Image GenerationCIFAR-10 unconditional
FID2.24
165
Text-to-Image GenerationMS-COCO 2014 (val)
FID15.13
137
Conditional Image GenerationImageNet 256x256 Guided-Diffusion (10k samples)
FID7.23
128
Text-to-Image GenerationStable Diffusion 10k samples v1.4
CLIP Similarity99.38
119
Image GenerationLSUN church
FID10.69
117
Image GenerationCIFAR10 50k samples (test)
FID2
81
Image GenerationCelebA
FID5.4
65
Text-to-Image GenerationStable Diffusion 10k samples v1.4 (test)
RMSE Loss0.0381
44
Showing 10 of 15 rows

Other info

Code

Follow for update