Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression

About

Despite their remarkable progress in multimodal understanding tasks, large vision language models (LVLMs) often suffer from "hallucinations", generating texts misaligned with the visual context. Existing methods aimed at reducing hallucinations through inference time intervention incur a significant increase in latency. To mitigate this, we present SPIN, a task-agnostic attention-guided head suppression strategy that can be seamlessly integrated during inference, without incurring any significant compute or latency overhead. We investigate whether hallucination in LVLMs can be linked to specific model components. Our analysis suggests that hallucinations can be attributed to a dynamic subset of attention heads in each layer. Leveraging this insight, for each text query token, we selectively suppress attention heads that exhibit low attention to image tokens, keeping the top-K attention heads intact. Extensive evaluations on visual question answering and image description tasks demonstrate the efficacy of SPIN in reducing hallucination scores up to 2.7x while maintaining F1, and improving throughput by 1.8x compared to existing alternatives. Code is available at https://github.com/YUECHE77/SPIN.

Sreetama Sarkar, Yue Che, Alex Gavin, Peter A. Beerel, Souvik Kundu• 2025

Related benchmarks

TaskDatasetResultRank
Multimodal UnderstandingMMBench--
847
Visual Question AnsweringOK-VQA (test)
Accuracy65.58
327
Object Hallucination EvaluationMS-COCO (POPE Adversarial)
Accuracy83.83
190
Object Hallucination EvaluationMS-COCO POPE (Popular)
Accuracy85.83
158
Multimodal Hallucination EvaluationMMHal-Bench
Average Score3.47
129
Multimodal UnderstandingMME
Score1.82e+3
125
Object Hallucination EvaluationMS-COCO POPE Random
Accuracy86.84
121
Visual Question AnsweringE-VQA (test)
Accuracy57.95
85
Visual Question AnsweringInfoSeek (test)
Accuracy45.35
81
Object Hallucination EvaluationMSCOCO 2014 (val)
CHAIRs46.6
81
Showing 10 of 18 rows

Other info

Follow for update