HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

About

While large vision-language models (LVLMs) have demonstrated impressive capabilities in interpreting multi-modal contexts, they invariably suffer from object hallucinations (OH). We introduce HALC, a novel decoding algorithm designed to mitigate OH in LVLMs. HALC leverages distinct fine-grained optimal visual information in vision-language tasks and operates on both local and global contexts simultaneously. Specifically, HALC integrates a robust auto-focal grounding mechanism (locally) to correct hallucinated tokens on the fly, and a specialized beam search algorithm (globally) to significantly reduce OH while preserving text generation quality. Additionally, HALC can be integrated into any LVLMs as a plug-and-play module without extra training. Extensive experimental studies demonstrate the effectiveness of HALC in reducing OH, outperforming state-of-the-arts across four benchmarks.

Zhaorun Chen, Zhuokai Zhao, Hongyin Luo, Huaxiu Yao, Bo Li, Jiawei Zhou• 2024

Related benchmarks

Task	Dataset	Result
Object Hallucination Evaluation	POPE	Accuracy84	2019
Multimodal Evaluation	MME	--	727
Hallucination Evaluation	CHAIR	CHAIR_s53.8	393
Multimodal Capability Evaluation	MM-Vet	Score27.2	393
Object Hallucination	POPE Popular	F1 Score69.31	372
Object Hallucination	POPE Adversarial	Accuracy55.53	353
Object Hallucination	POPE (Random)	F1 Score73.44	324
Hallucination Evaluation	MMHal-Bench	MMHal Score2.27	306
Hallucination Evaluation	AMBER	--	222
Object Hallucination Evaluation	CHAIR	--	154

Showing 10 of 42 rows

Other info

Follow for update

@wizwand_team Discord