DeepCache: Accelerating Diffusion Models for Free

About

Diffusion models have recently gained unprecedented attention in the field of image synthesis due to their remarkable generative capabilities. Notwithstanding their prowess, these models often incur substantial computational costs, primarily attributed to the sequential denoising process and cumbersome model size. Traditional methods for compressing diffusion models typically involve extensive retraining, presenting cost and feasibility challenges. In this paper, we introduce DeepCache, a novel training-free paradigm that accelerates diffusion models from the perspective of model architecture. DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models, which caches and retrieves features across adjacent denoising stages, thereby curtailing redundant computations. Utilizing the property of the U-Net, we reuse the high-level features while updating the low-level features in a very cheap way. This innovative strategy, in turn, enables a speedup factor of 2.3$\times$ for Stable Diffusion v1.5 with only a 0.05 decline in CLIP Score, and 4.1$\times$ for LDM-4-G with a slight decrease of 0.22 in FID on ImageNet. Our experiments also demonstrate DeepCache's superiority over existing pruning and distillation methods that necessitate retraining and its compatibility with current sampling techniques. Furthermore, we find that under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS. The code is available at https://github.com/horseee/DeepCache

Xinyin Ma, Gongfan Fang, Xinchao Wang• 2023

Related benchmarks

Task	Dataset	Result
Class-conditional Image Generation	ImageNet 256x256	Inception Score (IS)202.8	967
Class-conditional Image Generation	ImageNet 256x256 (train)	IS204.1	367
Text-to-Image Generation	MS-COCO (val)	FID7.36	202
Image Generation	CIFAR-10 32x32	FID4.35	151
Text-to-Image Generation	MS-COCO	FID23.45	145
Text-to-Image Generation	MS-COCO 2014 (val)	FID19.44	143
Text-to-Image Generation	MS-COCO 2017 (val)	FID21.53	131
Unconditional Image Generation	CIFAR-10 32 x 32	FID4.7	71
Unconditional Image Generation	LSUN Bedroom 256x256	FID6.69	68
Text-to-Image Generation	MS COCO 2017	FID29.61	41

Showing 10 of 22 rows

Other info

Code

Follow for update

@wizwand_team Discord