Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hallucinatory Image Tokens: A Training-free EAZY Approach on Detecting and Mitigating Object Hallucinations in LVLMs

About

Despite their remarkable potential, Large Vision-Language Models (LVLMs) still face challenges with object hallucination, a problem where their generated outputs mistakenly incorporate objects that do not actually exist. Although most works focus on addressing this issue within the language-model backbone, our work shifts the focus to the image input source, investigating how specific image tokens contribute to hallucinations. Our analysis reveals a striking finding: a small subset of image tokens with high attention scores are the primary drivers of object hallucination. By removing these hallucinatory image tokens (only 1.5% of all image tokens), the issue can be effectively mitigated. This finding holds consistently across different models and datasets. Building on this insight, we introduce EAZY, a novel, training-free method that automatically identifies and Eliminates hAllucinations by Zeroing out hallucinatorY image tokens. We utilize EAZY for unsupervised object hallucination detection, achieving 15% improvement compared to previous methods. Additionally, EAZY demonstrates remarkable effectiveness in mitigating hallucinations while preserving model utility and seamlessly adapting to various LVLM architectures.

Liwei Che, Tony Qingze Liu, Jing Jia, Weiyi Qin, Ruixiang Tang, Vladimir Pavlovic• 2025

Related benchmarks

TaskDatasetResultRank
Object Hallucination EvaluationPOPE--
1455
Object Hallucination AssessmentMSCOCO (500 random samples)
Cs38.8
25
Object Hallucination MitigationMS COCO (500 random samples)
Cs Score26.6
23
Caption Hallucination EvaluationCHAIR
CS Score38.8
20
Multimodal Perception EvaluationMME
Existence Score184.6
16
Object Hallucination DetectionMS-COCO 2014 (val)
Accuracy78.77
5
Showing 6 of 6 rows

Other info

Follow for update