Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment

About

Despite substantial progress in text-to-image generation, achieving precise text-image alignment remains challenging, particularly for prompts with rich compositional structure or imaginative elements. To address this, we introduce Negative Prompting for Image Correction (NPC), an automated pipeline that improves alignment by identifying and applying negative prompts that suppress unintended content. We begin by analyzing cross-attention patterns to explain why both targeted negatives-those directly tied to the prompt's alignment error-and untargeted negatives-tokens unrelated to the prompt but present in the generated image-can enhance alignment. To discover useful negatives, NPC generates candidate prompts using a verifier-captioner-proposer framework and ranks them with a salient text-space score, enabling effective selection without requiring additional image synthesis. On GenEval++ and Imagine-Bench, NPC outperforms strong baselines, achieving 0.571 vs. 0.371 on GenEval++ and the best overall performance on Imagine-Bench. By guiding what not to generate, NPC provides a principled, fully automated route to stronger text-image alignment in diffusion models. Code is released at https://github.com/wiarae/NPC.

Sangha Park, Eunji Kim, Yeongtak Oh, Jooyoung Choi, Sungroh Yoon• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationT2I-CompBench
Shape Fidelity56.81
94
Text-to-Image GenerationDPG-Bench (test)
Global Fidelity83.88
43
Text-to-Image GenerationGenEval++
Color Accuracy55
35
Text-to-Image GenerationImagine-Bench
Attribute Shift Score6.31
33
Showing 4 of 4 rows

Other info

Follow for update