Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

About

Recent controllable generation approaches such as FreeControl and Diffusion Self-Guidance bring fine-grained spatial and appearance control to text-to-image (T2I) diffusion models without training auxiliary modules. However, these methods optimize the latent embedding for each type of score function with longer diffusion steps, making the generation process time-consuming and limiting their flexibility and use. This work presents Ctrl-X, a simple framework for T2I diffusion controlling structure and appearance without additional training or guidance. Ctrl-X designs feed-forward structure control to enable the structure alignment with a structure image and semantic-aware appearance transfer to facilitate the appearance transfer from a user-input image. Extensive qualitative and quantitative experiments illustrate the superior performance of Ctrl-X on various condition inputs and model checkpoints. In particular, Ctrl-X supports novel structure and appearance control with arbitrary condition images of any modality, exhibits superior image quality and appearance transfer compared to existing works, and provides instant plug-and-play functionality to any T2I and text-to-video (T2V) diffusion model. See our project page for an overview of the results: https://genforce.github.io/ctrl-x

Kuan Heng Lin, Sicheng Mo, Ben Klingher, Fangzhou Mu, Bolei Zhou• 2024

Related benchmarks

Task	Dataset	Result
Inference Efficiency	Inference Efficiency Evaluation	Inference Latency (s)10.91	12
Controllable Text-to-Image Generation	Controllable T2I benchmark	Self-Sim0.104	9
Conditional Generation	Controllable generation dataset ControlNet-supported 1.0	Self-sim0.134	8
Conditional Generation	Controllable generation dataset New condition 1.0	Self-similarity0.135	8
Structure and appearance control	ControlNet-supported	Self-sim0.121	7
Structure and appearance control	Natural image	Self-similarity0.057	7
Structure and appearance control	New condition	Self-sim0.109	7
Controllable Image Generation	User study (Amazon Mechanical Turk)	--	6
Controllable Image Generation	User Study	Preference Rate21.67	4

Showing 9 of 9 rows

Other info

Code

Follow for update

@wizwand_team Discord