TextBoost: Boosting Text Encoder for Personalized Text-to-Image Generation

About

In this paper, we introduce TextBoost, an efficient one-shot personalization approach for text-to-image diffusion models. Traditional personalization methods typically involve fine-tuning extensive portions of the model, leading to substantial storage requirements and slow convergence. In contrast, we propose selectively fine-tuning only the text encoder, significantly improving computational and storage efficiency. To preserve the original semantic integrity, we develop a novel causality-preserving adaptation mechanism. Additionally, lightweight adapters are employed to locally refine text embeddings immediately before their interaction with cross-attention layers, greatly enhancing the expressiveness of text embeddings with minimal computational overhead. Empirical evaluations across diverse concepts demonstrate that TextBoost achieves faster convergence and substantially reduces storage demands by minimizing the number of trainable parameters. Furthermore, TextBoost maintains comparable subject fidelity, superior text fidelity, and greater generation diversity compared to existing methods. We show that our proposed method offers an efficient, scalable, and practically applicable solution for high-quality text-to-image personalization, particularly beneficial in resource-constrained environments.

NaHyeon Park, Kunhee Kim, Hyunjung Shim• 2024

Related benchmarks

Task	Dataset	Result
Personalized Image Generation	DreamBooth	DINO Score16.7	45
Customized Text-to-Image Generation	DreamBench (test)	DINO Score0.167	21
Personalized Text-to-Image Generation	DreamBooth	VQA Score57.7	4

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord