Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning

About

Diffusion models are trained by learning a sequence of models that reverse each step of noise corruption. Typically, the model parameters are fully shared across multiple timesteps to enhance training efficiency. However, since the denoising tasks differ at each timestep, the gradients computed at different timesteps may conflict, potentially degrading the overall performance of image generation. To solve this issue, this work proposes a \textbf{De}couple-then-\textbf{Me}rge (\textbf{DeMe}) framework, which begins with a pretrained model and finetunes separate models tailored to specific timesteps. We introduce several improved techniques during the finetuning stage to promote effective knowledge sharing while minimizing training interference across timesteps. Finally, after finetuning, these separate models can be merged into a single model in the parameter space, ensuring efficient and practical inference. Experimental results show significant generation quality improvements upon 6 benchmarks including Stable Diffusion on COCO30K, ImageNet1K, PartiPrompts, and DDPM on LSUN Church, LSUN Bedroom, and CIFAR10. Code is available at \href{https://github.com/MqLeet/DeMe}{GitHub}.

Qianli Ma, Xuefei Ning, Dongrui Liu, Li Niu, Linfeng Zhang• 2024

Related benchmarks

Task	Dataset	Result
Unconditional Image Generation	CIFAR-10 (test)	FID3.51	223
Image Generation	CIFAR10 32x32 (test)	FID3.51	186
Text-to-Image Generation	MS-COCO	FID12.78	145
Image Generation	LSUN Church 256x256 (test)	FID7.27	61
Text-to-Image Generation	PartiPrompts	CLIP Score30.02	26
Unconditional Image Generation	LSUN Church (test)	FID7.27	17
Unconditional Image Generation	LSUN Bedroom (test)	FID5.84	14
Text-to-Image Generation	ImageNet	FID26.36	9
Image Generation	LSUN-Bedroom 256 x 256 (test val)	FID5.84	5

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord