Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

About

Recent strides in the development of diffusion models, exemplified by advancements such as Stable Diffusion, have underscored their remarkable prowess in generating visually compelling images. However, the imperative of achieving a seamless alignment between the generated image and the provided prompt persists as a formidable challenge. This paper traces the root of these difficulties to invalid initial noise, and proposes a solution in the form of Initial Noise Optimization (InitNO), a paradigm that refines this noise. Considering text prompts, not all random noises are effective in synthesizing semantically-faithful images. We design the cross-attention response score and the self-attention conflict score to evaluate the initial noise, bifurcating the initial latent space into valid and invalid sectors. A strategically crafted noise optimization pipeline is developed to guide the initial noise towards valid regions. Our method, validated through rigorous experimentation, shows a commendable proficiency in generating images in strict accordance with text prompts. Our code is available at https://github.com/xiefan-guo/initno.

Xiefan Guo, Jinlin Liu, Miaomiao Cui, Jiankai Li, Hongyu Yang, Di Huang• 2024

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationT2I-CompBench
Shape Fidelity55.91
185
Text-to-Image GenerationPick-a-Pic
ImageReward-1.9692
107
Text-to-Image Compositional AlignmentT2I-CompBench++ v2 (test)
Color70.38
37
Text-to-Image GenerationHPD
PickScore16.765
22
Text-to-Image GenerationVISOR
OA (%)60.4
21
Text-to-Image GenerationChefer prompts Animal-Animal
BLIP-VQA Score72.64
12
Text-to-Image GenerationChefer prompts (Object-Object)
BLIP-VQA54.06
12
Text-to-Image GenerationChefer Prompts Animal-Object
BLIP-VQA0.7998
12
Text-to-Image SynthesisUser study 20 questions (test)
User Preference Rate63.33
7
Text-to-Image GenerationDrawBench
PickScore17.4188
7
Showing 10 of 10 rows

Other info

Code

Follow for update