FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

About

Recently, conditional diffusion models have gained popularity in numerous applications due to their exceptional generation ability. However, many existing methods are training-required. They need to train a time-dependent classifier or a condition-dependent score estimator, which increases the cost of constructing conditional diffusion models and is inconvenient to transfer across different conditions. Some current works aim to overcome this limitation by proposing training-free solutions, but most can only be applied to a specific category of tasks and not to more general conditions. In this work, we propose a training-Free conditional Diffusion Model (FreeDoM) used for various conditions. Specifically, we leverage off-the-shelf pre-trained networks, such as a face detection model, to construct time-independent energy functions, which guide the generation process without requiring training. Furthermore, because the construction of the energy function is very flexible and adaptable to various conditions, our proposed FreeDoM has a broader range of applications than existing training-free methods. FreeDoM is advantageous in its simplicity, effectiveness, and low cost. Experiments demonstrate that FreeDoM is effective for various conditions and suitable for diffusion models of diverse data domains, including image and latent code domains.

Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, Jian Zhang• 2023

Related benchmarks

Task	Dataset	Result
Class-conditional Image Generation	ImageNet	FID200	189
Conditional Image Generation	CIFAR-10	FID135	88
Aesthetic Image Generation	FLUX	Aesthetic Score6.8406	22
Text-aligned Image Generation	FLUX	Pick-Score0.2133	22
Text-to-Image Generation	Pick-a-Pic (val)	PickScore22.13	20
Aesthetic Image Generation	Z-Image	Aesthetic Score6.0753	20
Text-to-Image Generation	Z-Image	Pick-Score0.2155	20
Conditional Image Generation	CelebA-HQ Gender+Age	Accuracy68.7	15
Conditional Image Generation	CelebA-HQ Gender+Hair	Accuracy67.1	15
Text-to-Image Generation	Pick-a-Pic, HPSv2, and PartiPrompts (test)	PickScore22.13	12

Showing 10 of 44 rows

Other info

Follow for update

@wizwand_team Discord