Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Training-free Content Injection using h-space in Diffusion Models

About

Diffusion models (DMs) synthesize high-quality images in various domains. However, controlling their generative process is still hazy because the intermediate variables in the process are not rigorously studied. Recently, the bottleneck feature of the U-Net, namely $h$-space, is found to convey the semantics of the resulting image. It enables StyleCLIP-like latent editing within DMs. In this paper, we explore further usage of $h$-space beyond attribute editing, and introduce a method to inject the content of one image into another image by combining their features in the generative processes. Briefly, given the original generative process of the other image, 1) we gradually blend the bottleneck feature of the content with proper normalization, and 2) we calibrate the skip connections to match the injected content. Unlike custom-diffusion approaches, our method does not require time-consuming optimization or fine-tuning. Instead, our method manipulates intermediate features within a feed-forward generative process. Furthermore, our method does not require supervision from external networks. The code is available at https://curryjung.github.io/InjectFusion/

Jaeseok Jeong, Mingi Kwon, Youngjung Uh• 2023

Related benchmarks

TaskDatasetResultRank
Artistic Style TransferMS-COCO content images and WikiArt style images 512x512 resolution (test)
FID (Artistic Style)41.464
13
Artistic transferWikiArt
FID (Style)20.903
11
Photo-realistic transferMSCOCO
FID (Style)33.119
11
Style TransferContent and style image pairs
Inference Time (sec)355.9
4
Showing 4 of 4 rows

Other info

Follow for update