SAVAA: Mitigating Hallucinations in LVLMs via Step-wise Adaptive Visual Attention Amplification

About

A line of recent training-free methods for mitigating hallucinations in large vision-language models (LVLMs) operates by amplifying attention to visual tokens during autoregressive generation within a single forward pass. We refer to this paradigm as visual attention amplification (VAA). In this paper, we identify a dual failure pattern in existing VAA methods caused by their use of a fixed amplification factor across generation steps: it can be too weak at some steps, leaving hallucinations unresolved, while too strong at others, introducing new hallucinations. Motivated by this finding, we propose Step-wise Adaptive Visual Attention Amplification (SAVAA), a new VAA framework that estimates hallucination risk for each generated token and uses the estimated risk to adaptively amplify visual attention at the next generation step. Specifically, we introduce Visual Grounding Entropy (VGE), a lightweight hallucination-risk estimator that augments predictive entropy with visual grounding, assigning higher risk to tokens that are uncertain, weakly grounded in the image, or both. Guided by VGE, SAVAA uses the estimated risk to calibrate the VAA factor for the next generation step, applying stronger amplification to higher-risk steps and weaker amplification to lower-risk steps. Across LLaVA-NeXT-7B, Qwen3-VL-8B, and InternVL3.5-8B, SAVAA significantly outperforms baseline methods on generative hallucination benchmarks such as CHAIR, SHR and AMBER. Code is available at: https://github.com/JiachengZ01/SAVVA.

Jiacheng Zhang, Feng Liu, Chao Du, Tianyu Pang• 2026

Related benchmarks

Task	Dataset	Result
Hallucination assessment	AMBER (test)	CHAIR6.5	38
Hallucination Mitigation	SHR	HSR22.1	15
Hallucination Evaluation	POPE MSCOCO, A-OKVQA, GQA average (Adversarial)	Accuracy84.09	15
Hallucination Evaluation	POPE MSCOCO, A-OKVQA, GQA average (Random)	Accuracy93.16	15
Hallucination Evaluation	POPE Popular MSCOCO, A-OKVQA, GQA average	Accuracy88.35	15
Object Hallucination Evaluation	CHAIR (val)	CHAIRs Score46	15

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord