HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
About
While large vision-language models (LVLMs) have demonstrated impressive capabilities in interpreting multi-modal contexts, they invariably suffer from object hallucinations (OH). We introduce HALC, a novel decoding algorithm designed to mitigate OH in LVLMs. HALC leverages distinct fine-grained optimal visual information in vision-language tasks and operates on both local and global contexts simultaneously. Specifically, HALC integrates a robust auto-focal grounding mechanism (locally) to correct hallucinated tokens on the fly, and a specialized beam search algorithm (globally) to significantly reduce OH while preserving text generation quality. Additionally, HALC can be integrated into any LVLMs as a plug-and-play module without extra training. Extensive experimental studies demonstrate the effectiveness of HALC in reducing OH, outperforming state-of-the-arts across four benchmarks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Hallucination Evaluation | POPE | Accuracy84 | 935 | |
| Multimodal Evaluation | MME | -- | 557 | |
| Object Hallucination | POPE (Random) | F1 Score73.44 | 200 | |
| Object Hallucination | POPE Adversarial | Accuracy55.53 | 196 | |
| Object Hallucination | POPE Popular | F1 Score69.31 | 188 | |
| Hallucination Evaluation | CHAIR | CHAIR_s53.8 | 166 | |
| Visual Hallucination Evaluation | MSCOCO | CHAIR_i15.7 | 104 | |
| Object Hallucination Evaluation | POPE Popular offline | F1 Score82.4 | 84 | |
| Object Hallucination Evaluation | POPE Random offline | F1 Score72.16 | 84 | |
| Object Hallucination Evaluation | POPE Adversarial offline | F1 Score68.04 | 84 |