When Looking Is Not Enough: Visual Attention Structure Reveals Hallucination in MLLMs
About
Multimodal large language models (MLLMs) have become a key interface for visual reasoning and grounded question answering, yet they remain vulnerable to visual hallucinations, where generated responses contradict image content or mention nonexistent objects. A central challenge is that hallucination is not always caused by a simple lack of visual attention: the model may still assign substantial attention mass to image tokens while internally drifting toward an incorrect answer. In this paper, we show that the high-frequency structure of visual attention, measured by layer-wise Laplacian energy, reveals both the layer where hallucinated preferences emerge and the layer where the ground-truth answer transiently recovers. Building on this finding, we propose LaSCD (Laplacian-Spectral Contrastive Decoding), a training-free decoding strategy that selects informative layers via Laplacian energy and remaps next-token logits in closed form. Experiments on hallucination and general multimodal benchmarks show that LaSCD consistently reduces hallucination while preserving general capabilities, highlighting its potential as a faithful decoding paradigm. The code is available at https://github.com/macovaseas/LaSCD.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Hallucination Evaluation | CHAIR | CHAIRi Score11.8 | 154 | |
| Object Hallucination Evaluation | POPE Popular | Accuracy87.45 | 96 | |
| Object Probing | POPE (average) | Accuracy87.16 | 52 | |
| Video Hallucination Evaluation | VideoHallucer | Overall Score64.8 | 46 | |
| Multi-modal Vision-Language Evaluation | MMVet | Accuracy38.8 | 38 | |
| Multi-modal Large Language Model Evaluation | MME | MME Hall Total Score709.9 | 24 | |
| Object Probing | POPE (Random) | Accuracy90.57 | 20 | |
| Object Probing | POPE Adversarial | Accuracy83.46 | 20 | |
| Hallucination Evaluation | HallusionBench 1.0 (test) | fACC22.2 | 10 | |
| General-purpose Visual Instruction Following | LLaVA-Bench In-the-Wild | Average Score49.3 | 9 |