TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition
About
Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided composition. This task aims to seamlessly integrate user-provided objects into a specific visual context. Current diffusion-based methods often involve costly instance-based optimization or finetuning of pretrained models on customized datasets, which can potentially undermine their rich prior. In contrast, TF-ICON can leverage off-the-shelf diffusion models to perform cross-domain image-guided composition without requiring additional training, finetuning, or optimization. Moreover, we introduce the exceptional prompt, which contains no information, to facilitate text-driven diffusion models in accurately inverting real images into latent representations, forming the basis for compositing. Our experiments show that equipping Stable Diffusion with the exceptional prompt outperforms state-of-the-art inversion methods on various datasets (CelebA-HQ, COCO, and ImageNet), and that TF-ICON surpasses prior baselines in versatile visual domains. Code is available at https://github.com/Shilin-LU/TF-ICON
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Compositional Image Generation | ComplexCompo 300 | CLIP-I0.6987 | 20 | |
| Image Composition | DreamEditBench 220 | CLIP-I0.7479 | 14 | |
| Image Editing | Complex-Compo 300 | HPSv39.3258 | 13 | |
| Image Editing | DreamEdit-Bench 220 | HPSv37.2643 | 13 | |
| Image Composition | User Study | Average Ranking8.36 | 13 | |
| Image Composition | Resolution Benchmark 512 x 512 | Latency (s)24.55 | 13 | |
| Object Compositing | DreamBooth (test) | -- | 10 | |
| Image Compositing | DreamBooth + COCO (val) | Quality Score2.75 | 6 | |
| Identity-preserving Image Generation | DreamBooth (test) | Realism0.4662 | 6 | |
| Image Compositing | DreamBooth + COCO val (test) | DINO Score74.743 | 5 |