DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

About

The diffusion-based text-to-image model harbors immense potential in transferring reference style. However, current encoder-based approaches significantly impair the text controllability of text-to-image models while transferring styles. In this paper, we introduce DEADiff to address this issue using the following two strategies: 1) a mechanism to decouple the style and semantics of reference images. The decoupled feature representations are first extracted by Q-Formers which are instructed by different text descriptions. Then they are injected into mutually exclusive subsets of cross-attention layers for better disentanglement. 2) A non-reconstructive learning method. The Q-Formers are trained using paired images rather than the identical target, in which the reference image and the ground-truth image are with the same style or semantics. We show that DEADiff attains the best visual stylization results and optimal balance between the text controllability inherent in the text-to-image model and style similarity to the reference image, as demonstrated both quantitatively and qualitatively. Our project page is https://tianhao-qi.github.io/DEADiff/.

Tianhao Qi, Shancheng Fang, Yanze Wu, Hongtao Xie, Jiawei Liu, Lang Chen, Qian He, Yongdong Zhang• 2024

Related benchmarks

Task	Dataset	Result
Subject-driven image generation	DreamBench	DINO Score53.2	113
Image Style Transfer	User Study	Overall Quality Score59.2	30
Disentanglement Analysis	MPI3D complex	DCI Score0.336	14
Style Transfer	CIFAR-100 and InstaStyle (test)	Content Score28.6	9
Text-driven Style Transfer	Benchmark of 52 prompts and 20 style images 1.0 (test)	Text Alignment0.229	8
Style Transfer	Style Transfer Evaluation Set (test)	Style Score51.34	8
Style Transfer	Single image on A100 GPU (test)	Inference Time (s)3	7
Visual Concept Generation	DisenBench	Mask CLIP-I0.736	7
Text-driven Style Transfer	User preference study set (test)	Human Preference (Text)19.3	6

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord