Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TextBoost: Boosting Text Encoder for Personalized Text-to-Image Generation

About

In this paper, we introduce TextBoost, an efficient one-shot personalization approach for text-to-image diffusion models. Traditional personalization methods typically involve fine-tuning extensive portions of the model, leading to substantial storage requirements and slow convergence. In contrast, we propose selectively fine-tuning only the text encoder, significantly improving computational and storage efficiency. To preserve the original semantic integrity, we develop a novel causality-preserving adaptation mechanism. Additionally, lightweight adapters are employed to locally refine text embeddings immediately before their interaction with cross-attention layers, greatly enhancing the expressiveness of text embeddings with minimal computational overhead. Empirical evaluations across diverse concepts demonstrate that TextBoost achieves faster convergence and substantially reduces storage demands by minimizing the number of trainable parameters. Furthermore, TextBoost maintains comparable subject fidelity, superior text fidelity, and greater generation diversity compared to existing methods. We show that our proposed method offers an efficient, scalable, and practically applicable solution for high-quality text-to-image personalization, particularly beneficial in resource-constrained environments.

NaHyeon Park, Kunhee Kim, Hyunjung Shim• 2024

Related benchmarks

TaskDatasetResultRank
Personalized Image GenerationDreamBooth
CLIP-I Score57
34
Customized Text-to-Image GenerationDreamBench (test)
DINO Score0.167
21
Personalized Text-to-Image GenerationDreamBooth
VQA Score57.7
4
Showing 3 of 3 rows

Other info

Follow for update