Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models
About
Negative guidance -- explicitly suppressing unwanted attributes -- remains a fundamental challenge in diffusion models, particularly in few-step sampling regimes. While Classifier-Free Guidance (CFG) works well in standard settings, it fails under aggressive sampling step compression due to divergent predictions between positive and negative branches. We present Normalized Attention Guidance (NAG), an efficient, training-free mechanism that applies extrapolation in attention space with L1-based normalization and refinement. NAG restores effective negative guidance where CFG collapses while maintaining fidelity. Unlike existing approaches, NAG generalizes across architectures (UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image, video), functioning as a \textit{universal} plug-in with minimal computational overhead. Through extensive experimentation, we demonstrate consistent improvements in text alignment (CLIP Score), fidelity (FID, PFID), and human-perceived quality (ImageReward). Our ablation studies validate each design component, while user studies confirm significant preference for NAG-guided outputs. As a model-agnostic inference-time approach requiring no retraining, NAG provides effortless negative guidance for all modern diffusion frameworks -- pseudocode in the Appendix!
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Image Generation | NegGenBench (test) | Positive Score100 | 22 | |
| Negative Concept Suppression | LLM-generated prompts | Suppression Rate (%)52.25 | 10 | |
| Negative Concept Suppression | COCO derived prompts | Suppression (%)64.25 | 10 | |
| Negative Concept Suppression | Combined LLM-generated + COCO-derived | Suppression Rate58.25 | 10 | |
| Negative Concept Suppression | DCS-Bench COCO-derived prompts FLUX (dev) | Suppression Rate (%)72.5 | 10 | |
| Negative Concept Suppression | DCS-Bench LLM-generated prompts on FLUX (dev) | Negative Concept Suppression (%)60.75 | 10 | |
| Negative Concept Suppression | DCS-Bench Combined FLUX (dev) | Negative Concept Suppression (%)66.62 | 10 | |
| Text-to-Image Generation | NVIDIA A6000 GPU Environment | Inference Time (s)45.02 | 9 | |
| Text-to-Image Generation | GenEval and MS-COCO SDXL (test) | CLIP Score (CS)27.1 | 7 | |
| Negative Concept Suppression | DCS-Bench | Human Preference Score46.22 | 5 |