Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models

About

Negative guidance -- explicitly suppressing unwanted attributes -- remains a fundamental challenge in diffusion models, particularly in few-step sampling regimes. While Classifier-Free Guidance (CFG) works well in standard settings, it fails under aggressive sampling step compression due to divergent predictions between positive and negative branches. We present Normalized Attention Guidance (NAG), an efficient, training-free mechanism that applies extrapolation in attention space with L1-based normalization and refinement. NAG restores effective negative guidance where CFG collapses while maintaining fidelity. Unlike existing approaches, NAG generalizes across architectures (UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image, video), functioning as a \textit{universal} plug-in with minimal computational overhead. Through extensive experimentation, we demonstrate consistent improvements in text alignment (CLIP Score), fidelity (FID, PFID), and human-perceived quality (ImageReward). Our ablation studies validate each design component, while user studies confirm significant preference for NAG-guided outputs. As a model-agnostic inference-time approach requiring no retraining, NAG provides effortless negative guidance for all modern diffusion frameworks -- pseudocode in the Appendix!

Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou, Yi-Zhe Song• 2025

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	NegGenBench (test)	Positive Score100	22
Negative Concept Suppression	LLM-generated prompts	Suppression Rate (%)52.25	10
Negative Concept Suppression	COCO derived prompts	Suppression (%)64.25	10
Negative Concept Suppression	Combined LLM-generated + COCO-derived	Suppression Rate58.25	10
Negative Concept Suppression	DCS-Bench COCO-derived prompts FLUX (dev)	Suppression Rate (%)72.5	10
Negative Concept Suppression	DCS-Bench LLM-generated prompts on FLUX (dev)	Negative Concept Suppression (%)60.75	10
Negative Concept Suppression	DCS-Bench Combined FLUX (dev)	Negative Concept Suppression (%)66.62	10
Text-to-Image Generation	NVIDIA A6000 GPU Environment	Inference Time (s)45.02	9
Text-to-Image Generation	GenEval and MS-COCO SDXL (test)	CLIP Score (CS)27.1	7
Negative Concept Suppression	DCS-Bench	Human Preference Score46.22	5

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord