Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis

About

Diffusion models have demonstrated impressive abilities in generating photo-realistic and creative images. To offer more controllability for the generation process, existing studies, termed as early-constraint methods in this paper, leverage extra conditions and incorporate them into pre-trained diffusion models. Particularly, some of them adopt condition-specific modules to handle conditions separately, where they struggle to generalize across other conditions. Although follow-up studies present unified solutions to solve the generalization problem, they also require extra resources to implement, e.g., additional inputs or parameter optimization, where more flexible and efficient solutions are expected to perform steerable guided image synthesis. In this paper, we present an alternative paradigm, namely Late-Constraint Diffusion (LaCon), to simultaneously integrate various conditions into pre-trained diffusion models. Specifically, LaCon establishes an alignment between the external condition and the internal features of diffusion models, and utilizes the alignment to incorporate the target condition, guiding the sampling process to produce tailored results. Experimental results on COCO dataset illustrate the effectiveness and superior generalization capability of LaCon under various conditions and settings. Ablation studies investigate the functionalities of different components in LaCon, and illustrate its great potential to serve as an efficient solution to offer flexible controllability for diffusion models.

Chang Liu, Rui Li, Kaidong Zhang, Xin Luo, Dong Liu• 2023

Related benchmarks

TaskDatasetResultRank
Controllable Image GenerationCOCO (test)
Inference Latency (s)6.29
14
Conditional Image Generation (HED Edge)COCO 5,000 samples 2017 (val)
FID21.02
6
Conditional Image Generation (Color Stroke)COCO 5,000 samples 2017 (val)
FID20.27
3
Conditional Image Generation (Image Palette)COCO 5,000 samples 2017 (val)
FID20.61
3
Conditional Image Generation (Binary Mask)COCO 5,000 samples 2017 (val)
FID20.94
1
Showing 5 of 5 rows

Other info

Code

Follow for update