Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

When Looking Is Not Enough: Visual Attention Structure Reveals Hallucination in MLLMs

About

Multimodal large language models (MLLMs) have become a key interface for visual reasoning and grounded question answering, yet they remain vulnerable to visual hallucinations, where generated responses contradict image content or mention nonexistent objects. A central challenge is that hallucination is not always caused by a simple lack of visual attention: the model may still assign substantial attention mass to image tokens while internally drifting toward an incorrect answer. In this paper, we show that the high-frequency structure of visual attention, measured by layer-wise Laplacian energy, reveals both the layer where hallucinated preferences emerge and the layer where the ground-truth answer transiently recovers. Building on this finding, we propose LaSCD (Laplacian-Spectral Contrastive Decoding), a training-free decoding strategy that selects informative layers via Laplacian energy and remaps next-token logits in closed form. Experiments on hallucination and general multimodal benchmarks show that LaSCD consistently reduces hallucination while preserving general capabilities, highlighting its potential as a faithful decoding paradigm. The code is available at https://github.com/macovaseas/LaSCD.

Fanpu Cao, Xin Zou, Xuming Hu, Hui Xiong• 2026

Related benchmarks

TaskDatasetResultRank
Object Hallucination EvaluationCHAIR
CHAIRi Score11.8
154
Object Hallucination EvaluationPOPE Popular
Accuracy87.45
96
Object ProbingPOPE (average)
Accuracy87.16
52
Video Hallucination EvaluationVideoHallucer
Overall Score64.8
46
Multi-modal Vision-Language EvaluationMMVet
Accuracy38.8
38
Multi-modal Large Language Model EvaluationMME
MME Hall Total Score709.9
24
Object ProbingPOPE (Random)
Accuracy90.57
20
Object ProbingPOPE Adversarial
Accuracy83.46
20
Hallucination EvaluationHallusionBench 1.0 (test)
fACC22.2
10
General-purpose Visual Instruction FollowingLLaVA-Bench In-the-Wild
Average Score49.3
9
Showing 10 of 10 rows

Other info

Follow for update