OminiControl: Minimal and Universal Control for Diffusion Transformer

About

We present OminiControl, a novel approach that rethinks how image conditions are integrated into Diffusion Transformer (DiT) architectures. Current image conditioning methods either introduce substantial parameter overhead or handle only specific control tasks effectively, limiting their practical versatility. OminiControl addresses these limitations through three key innovations: (1) a minimal architectural design that leverages the DiT's own VAE encoder and transformer blocks, requiring just 0.1% additional parameters; (2) a unified sequence processing strategy that combines condition tokens with image tokens for flexible token interactions; and (3) a dynamic position encoding mechanism that adapts to both spatially-aligned and non-aligned control tasks. Our extensive experiments show that this streamlined approach not only matches but surpasses the performance of specialized methods across multiple conditioning tasks. To overcome data limitations in subject-driven generation, we also introduce Subjects200K, a large-scale dataset of identity-consistent image pairs synthesized using DiT models themselves. This work demonstrates that effective image control can be achieved without architectural complexity, opening new possibilities for efficient and versatile image generation systems.

Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, Xinchao Wang• 2024

Related benchmarks

Task	Dataset	Result
Subject-driven image generation	DreamBench	DINO Score68.4	113
Personalized Image Generation	DreamBooth	DINO Score56.22	45
Subject-driven generation	DreamBench	DINO Score0.684	30
Personalized Text-to-Image Generation	DreamBench++ Single-subject	CP0.596	18
Image Personalization	User Study Personalization Tasks	Concept Preservation (CP)72.1	17
Outfit Generation	VITON-HD	LPIPS0.594	13
Outfit Generation	Fashion130K	LPIPS0.683	12
Subject-driven generation	DreamBench v1 (test)	DINO Score0.684	11
Free-form editing (Customization)	DreamBooth	MDINO Score6.16	11
Free-form Image Editing	DreamBooth	SC Score6.33	11

Showing 10 of 35 rows

Other info

Follow for update

@wizwand_team Discord