Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

About

Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes. In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new "words" in the embedding space of a frozen text-to-image model. These "words" can be composed into natural language sentences, guiding personalized creation in an intuitive way. Notably, we find evidence that a single word embedding is sufficient for capturing unique and varied concepts. We compare our approach to a wide range of baselines, and demonstrate that it can more faithfully portray the concepts across a range of applications and tasks. Our code, data and new words will be available at: https://textual-inversion.github.io

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or• 2022

Related benchmarks

TaskDatasetResultRank
Subject-driven image generationDreamBench
DINO Score57.1
62
Few-shot Image ClassificationminiImageNet meta (test)
Accuracy85.44
46
Image Style TransferUser Study
Overall Quality Score37.9
30
Subject-driven generationDreamBench (test)
DINO Score0.569
25
Consistent Text-to-Image GenerationConsiStory+ (test)
CLIP-T0.8557
23
Image GenerationFaces
FID70.62
18
Personalized Text-to-Image GenerationDreamBench++ Single-subject
CP0.384
18
Text-to-Image PersonalizationDreamBooth original (test)
DINO Score0.569
18
Image PersonalizationUser Study Personalization Tasks
Concept Preservation (CP)14.2
17
Subject-driven image generationDreamBooth Dataset 1.0 (test)
DINO Score0.569
16
Showing 10 of 93 rows
...

Other info

Follow for update