Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

About

While large vision-language models (LVLMs) have demonstrated impressive capabilities in interpreting multi-modal contexts, they invariably suffer from object hallucinations (OH). We introduce HALC, a novel decoding algorithm designed to mitigate OH in LVLMs. HALC leverages distinct fine-grained optimal visual information in vision-language tasks and operates on both local and global contexts simultaneously. Specifically, HALC integrates a robust auto-focal grounding mechanism (locally) to correct hallucinated tokens on the fly, and a specialized beam search algorithm (globally) to significantly reduce OH while preserving text generation quality. Additionally, HALC can be integrated into any LVLMs as a plug-and-play module without extra training. Extensive experimental studies demonstrate the effectiveness of HALC in reducing OH, outperforming state-of-the-arts across four benchmarks.

Zhaorun Chen, Zhuokai Zhao, Hongyin Luo, Huaxiu Yao, Bo Li, Jiawei Zhou• 2024

Related benchmarks

TaskDatasetResultRank
Object Hallucination EvaluationPOPE
Accuracy84
935
Multimodal EvaluationMME--
557
Object HallucinationPOPE (Random)
F1 Score73.44
200
Object HallucinationPOPE Adversarial
Accuracy55.53
196
Object HallucinationPOPE Popular
F1 Score69.31
188
Hallucination EvaluationCHAIR
CHAIR_s53.8
166
Visual Hallucination EvaluationMSCOCO
CHAIR_i15.7
104
Object Hallucination EvaluationPOPE Popular offline
F1 Score82.4
84
Object Hallucination EvaluationPOPE Random offline
F1 Score72.16
84
Object Hallucination EvaluationPOPE Adversarial offline
F1 Score68.04
84
Showing 10 of 30 rows

Other info

Follow for update