Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-Concept Customization of Text-to-Image Diffusion

About

While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their own concepts (for example, their family, pets, or items). Can we teach a model to quickly acquire a new concept, given a few examples? Furthermore, can we compose multiple new concepts together? We propose Custom Diffusion, an efficient method for augmenting existing text-to-image models. We find that only optimizing a few parameters in the text-to-image conditioning mechanism is sufficiently powerful to represent new concepts while enabling fast tuning (~6 minutes). Additionally, we can jointly train for multiple concepts or combine multiple fine-tuned models into one via closed-form constrained optimization. Our fine-tuned model generates variations of multiple new concepts and seamlessly composes them with existing concepts in novel settings. Our method outperforms or performs on par with several baselines and concurrent works in both qualitative and quantitative evaluations while being memory and computationally efficient.

Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan Zhu• 2022

Related benchmarks

TaskDatasetResultRank
Subject-driven image generationDreamBench
DINO Score69.5
62
Multi-Concept Image Generation12-concept dataset
Text Alignment0.673
26
Image GenerationFaces
FID40.98
18
Text-to-Image PersonalizationDreamBooth original (test)
DINO Score0.643
18
Subject-driven image generationDreamBooth Dataset 1.0 (test)
DINO Score0.3967
16
Illumination-preserving image editing16 concepts under seven illuminants 1.0 (test)
Angular Error13.34
12
Customized Text-to-Image GenerationDreamBench (test)
DINO Score0.643
12
Face PersonalizationFaceForensics++ (test)
AdaFace Score0.4537
10
Few-shot personalization and encoder-based methods evaluationStandard Personalization Dataset
CLIP-I64.79
9
Multi-subject customizationUser Study (Single Subject)
Text Alignment0.7685
8
Showing 10 of 76 rows
...

Other info

Follow for update