Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HII-DPO: Eliminate Hallucination via Accurate Hallucination-Inducing Counterfactual Images

About

Large Vision-Language Models (VLMs) have achieved remarkable success across diverse multimodal tasks but remain vulnerable to hallucinations rooted in inherent language bias. Despite recent progress, existing hallucination mitigation methods often overlook the underlying hallucination patterns driven by language bias. In this work, we design a novel pipeline to accurately synthesize Hallucination-Inducing Images (HIIs). Using synthesized HIIs, we reveal a consistent scene-conditioned hallucination pattern: models tend to mention objects that are highly typical of the scene even when visual evidence is removed. To quantify the susceptibility of VLMs to this hallucination pattern, we establish the Masked-Object-Hallucination (MOH) benchmark to rigorously evaluate existing state-of-the-art alignment frameworks. Finally, we leverage HIIs to construct high-quality preference datasets for fine-grained alignment. Experimental results demonstrate that our approach effectively mitigates hallucinations while preserving general model capabilities. Specifically, our method achieves up to a 38% improvement over the current state-of-the-art on standard hallucination benchmarks.

Yilin Yang, Zhenghui Guo, Yuke Wang, Omprakash Gnawali, Sheng Di, Chengming Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVQA v2
Accuracy79.5
1165
Visual Question AnsweringTextVQA
Accuracy82.3
1117
Multimodal EvaluationMM-Vet
Accuracy69.8
122
Hallucination EvaluationHallusionBench--
93
Hallucination EvaluationAMBER--
71
Science Question AnsweringScienceQA
IMG Score88.5
49
Vision-Language UnderstandingMM-Vet
Total Score36.2
43
Hallucination EvaluationObject-HalBench
CHAIR Score (s)4.2
28
Hallucination EvaluationMOH
HR^D43.8
21
Showing 9 of 9 rows

Other info

Follow for update