Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

About

Text-to-image diffusion models produce impressive results but are frustrating tools for artists who desire fine-grained control. For example, a common use case is to create images of a specific instance in novel contexts, i.e., "identity-preserving generation". This setting, along with many other tasks (e.g., relighting), is a natural fit for image+text-conditional generative models. However, there is insufficient high-quality paired data to train such a model directly. We propose Diffusion Self-Distillation, a method for using a pre-trained text-to-image model to generate its own dataset for text-conditioned image-to-image tasks. We first leverage a text-to-image diffusion model's in-context generation ability to create grids of images and curate a large paired dataset with the help of a Visual-Language Model. We then fine-tune the text-to-image model into a text+image-to-image model using the curated paired dataset. We demonstrate that Diffusion Self-Distillation outperforms existing zero-shot methods and is competitive with per-instance tuning techniques on a wide range of identity-preservation generation tasks, without requiring test-time optimization.

Shengqu Cai, Eric Chan, Yunzhi Zhang, Leonidas Guibas, Jiajun Wu, Gordon Wetzstein• 2024

Related benchmarks

TaskDatasetResultRank
Cinematic Story GenerationViStoryBench
CSD (Cross)0.417
24
Personalized Text-to-Image GenerationDreamBench++ Single-subject
CP0.513
18
Image PersonalizationUser Study Personalization Tasks
Concept Preservation (CP)64.4
17
Personalized Text-to-Image GenerationDreamBench++ (test)
CP Score3.661
8
Multi-object compositingMulti-object compositing (test)
CLIP-I0.65
8
Personalized Image GenerationDreamBench++ GPT-4o score evaluation (test)
CP (Animal)64.7
8
Continuous Story GenerationAnimeBoard-GT
CSD Cross0.501
7
3D-conditioned Image GenerationUser Study
Faithfulness4.145
6
Identity-preserving Image Generation3D Assets (test)
GPT-eval Texture4.842
6
Showing 9 of 9 rows

Other info

Code

Follow for update