Hallucinatory Image Tokens: A Training-free EAZY Approach on Detecting and Mitigating Object Hallucinations in LVLMs

About

Despite their remarkable potential, Large Vision-Language Models (LVLMs) still face challenges with object hallucination, a problem where their generated outputs mistakenly incorporate objects that do not actually exist. Although most works focus on addressing this issue within the language-model backbone, our work shifts the focus to the image input source, investigating how specific image tokens contribute to hallucinations. Our analysis reveals a striking finding: a small subset of image tokens with high attention scores are the primary drivers of object hallucination. By removing these hallucinatory image tokens (only 1.5% of all image tokens), the issue can be effectively mitigated. This finding holds consistently across different models and datasets. Building on this insight, we introduce EAZY, a novel, training-free method that automatically identifies and Eliminates hAllucinations by Zeroing out hallucinatorY image tokens. We utilize EAZY for unsupervised object hallucination detection, achieving 15% improvement compared to previous methods. Additionally, EAZY demonstrates remarkable effectiveness in mitigating hallucinations while preserving model utility and seamlessly adapting to various LVLM architectures.

Liwei Che, Tony Qingze Liu, Jing Jia, Weiyi Qin, Ruixiang Tang, Vladimir Pavlovic• 2025

Related benchmarks

Task	Dataset	Result
Object Hallucination Evaluation	POPE	--	2019
Hallucination Evaluation	POPE	--	217
Object Hallucination Detection	MSCOCO	AUROC71.02	46
Caption Hallucination Evaluation	CHAIR	CS Score38.8	44
Object Hallucination Detection	Objects365	AUROC68.3	40
Object Hallucination Assessment	MSCOCO (500 random samples)	Cs38.8	25
Object Hallucination Mitigation	MS COCO (500 random samples)	Cs Score26.6	23
Generative Hallucination Mitigation	Object HalBench (test)	CHAIR_S38.8	18
Multimodal Perception Evaluation	MME	Existence Score184.6	16
Object Hallucination Detection	MS-COCO 2014 (val)	Accuracy78.77	5

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord