Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

About

Diffusion models have exhibited substantial success in text-to-image generation. However, they often encounter challenges when dealing with complex and dense prompts involving multiple objects, attribute binding, and long descriptions. In this paper, we propose a novel framework called \textbf{LLM4GEN}, which enhances the semantic understanding of text-to-image diffusion models by leveraging the representation of Large Language Models (LLMs). It can be seamlessly incorporated into various diffusion models as a plug-and-play component. A specially designed Cross-Adapter Module (CAM) integrates the original text features of text-to-image models with LLM features, thereby enhancing text-to-image generation. Additionally, to facilitate and correct entity-attribute relationships in text prompts, we develop an entity-guided regularization loss to further improve generation performance. We also introduce DensePrompts, which contains $7,000$ dense prompts to provide a comprehensive evaluation for the text-to-image generation task. Experiments indicate that LLM4GEN significantly improves the semantic alignment of SD1.5 and SDXL, demonstrating increases of 9.69\% and 12.90\% in color on T2I-CompBench, respectively. Moreover, it surpasses existing models in terms of sample quality, image-text alignment, and human evaluation.

Mushui Liu, Yuhang Ma, Yang Zhen, Jun Dan, Yunlong Yu, Zeng Zhao, Zhipeng Hu, Bai Liu, Changjie Fan• 2024

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
GenEval Score0.4083
108
Text-to-Image GenerationR2I-Bench
Causal Accuracy45
28
Long-text-to-Image GenerationDetailMaster
Character Presence19.43
12
Emotion-conditioned Text-to-Image GenerationEmotion-conditioned image generation (inference set)
Emo-A21.22
10
Image Quality AssessmentLongAlign
CLIPScore0.3362
5
Text-to-Image GenerationT2I-CompBench
Color50.84
5
Showing 6 of 6 rows

Other info

Follow for update