Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models

About

Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature. Existing solver-based acceleration methods often face image quality degradation under a low-latency budget. In this paper, we propose the Ensemble Parallel Direction solver (dubbed as \ours), a novel ODE solver that mitigates truncation errors by incorporating multiple parallel gradient evaluations in each ODE step. Importantly, since the additional gradient computations are independent, they can be fully parallelized, preserving low-latency sampling. Our method optimizes a small set of learnable parameters in a distillation fashion, ensuring minimal training overhead. In addition, our method can serve as a plugin to improve existing ODE samplers. Extensive experiments on various image synthesis benchmarks demonstrate the effectiveness of our \ours~in achieving high-quality and low-latency sampling. For example, at the same latency level of 5 NFE, EPD achieves an FID of 4.47 on CIFAR-10, 7.97 on FFHQ, 8.17 on ImageNet, and 8.26 on LSUN Bedroom, surpassing existing learning-based solvers by a significant margin. Codes are available in https://github.com/BeierZhu/EPD.

Beier Zhu, Ruoyu Wang, Tong Zhao, Hanwang Zhang, Chi Zhang• 2025

Related benchmarks

Task	Dataset	Result
Image Generation	CIFAR-10	FID4.33	212
Text-to-Image Generation	MS-COCO (val)	FID13.14	202
Class-conditional Image Generation	ImageNet 64x64	FID5.26	170
Image Generation	CIFAR-10 32x32	FID2.88	151
Unconditional Image Generation	CIFAR-10 32x32 (test)	FID2.42	137
Image Generation	ImageNet 64	FID6.35	109
Image Generation	LSUN bedroom	FID7.52	105
Conditional Image Generation	ImageNet 64x64 (val)	FID4.02	87
Image Generation	FFHQ	FID7.84	83
Image Generation	FFHQ 64x64	FID5.11	76

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord