Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer

About

Despite the impressive generative capabilities of diffusion models, existing diffusion model-based style transfer methods require inference-stage optimization (e.g. fine-tuning or textual inversion of style) which is time-consuming, or fails to leverage the generative ability of large-scale diffusion models. To address these issues, we introduce a novel artistic style transfer method based on a pre-trained large-scale diffusion model without any optimization. Specifically, we manipulate the features of self-attention layers as the way the cross-attention mechanism works; in the generation process, substituting the key and value of content with those of style image. This approach provides several desirable characteristics for style transfer including 1) preservation of content by transferring similar styles into similar image patches and 2) transfer of style based on similarity of local texture (e.g. edge) between content and style images. Furthermore, we introduce query preservation and attention temperature scaling to mitigate the issue of disruption of original content, and initial latent Adaptive Instance Normalization (AdaIN) to deal with the disharmonious color (failure to transfer the colors of style). Our experimental results demonstrate that our proposed method surpasses state-of-the-art methods in both conventional and diffusion-based style transfer baselines.

Jiwoo Chung, Sangeek Hyun, Jae-Pil Heo• 2023

Related benchmarks

TaskDatasetResultRank
Style TransferMS-COCO (content) + WikiArt (style) (test)
LPIPS0.4803
31
Artistic Style TransferMS-COCO content images and WikiArt style images 512x512 resolution (test)
FID (Artistic Style)28.801
13
Artistic transferWikiArt
FID (Style)18.131
11
Photo-realistic transferMSCOCO
FID (Style)24.349
11
Semantic Style Transferquadruple data (val)
SSL1.7538
11
Style TransferUser Study 10 content images, 8 style images (test)
Style Score21.2
9
Style TransferStyle-Content Pairs 50 style x 40 content references (test)
CSD Score0.453
8
Image-driven Style TransferImage-driven style transfer (evaluation set)
CLIP Alignment Score0.604
7
Style TransferContent and style image pairs
Inference Time (sec)12.4
4
Image Style TransferCurated image style transfer (test)
CSD-Score0.4
4
Showing 10 of 10 rows

Other info

Code

Follow for update