CoFi-Dec: Hallucination-Resistant Decoding via Coarse-to-Fine Generative Feedback in Large Vision-Language Models
About
Large Vision-Language Models (LVLMs) have achieved impressive progress in multi-modal understanding and generation. However, they still tend to produce hallucinated content that is inconsistent with the visual input, which limits their reliability in real-world applications. We propose \textbf{CoFi-Dec}, a training-free decoding framework that mitigates hallucinations by integrating generative self-feedback with coarse-to-fine visual conditioning. Inspired by the human visual process from global scene perception to detailed inspection, CoFi-Dec first generates two intermediate textual responses conditioned on coarse- and fine-grained views of the original image. These responses are then transformed into synthetic images using a text-to-image model, forming multi-level visual hypotheses that enrich grounding cues. To unify the predictions from these multiple visual conditions, we introduce a Wasserstein-based fusion mechanism that aligns their predictive distributions into a geometrically consistent decoding trajectory. This principled fusion reconciles high-level semantic consistency with fine-grained visual grounding, leading to more robust and faithful outputs. Extensive experiments on six hallucination-focused benchmarks show that CoFi-Dec substantially reduces both entity-level and semantic-level hallucinations, outperforming existing decoding strategies. The framework is model-agnostic, requires no additional training, and can be seamlessly applied to a wide range of LVLMs. The implementation is available at https://github.com/AI-Researcher-Team/CoFi-Dec.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Hallucination Evaluation | MS-COCO (POPE Adversarial) | Accuracy84.36 | 80 | |
| Object Hallucination Evaluation | MS-COCO POPE (Popular) | Accuracy87.67 | 76 | |
| Object Hallucination Evaluation | MS-COCO POPE Random | Accuracy90.33 | 55 | |
| Multimodal Reasoning | MMBench | Accuracy65.9 | 50 | |
| Object Hallucination Evaluation | A-OKVQA POPE Popular | Accuracy87.71 | 36 | |
| Object Hallucination Evaluation | A-OKVQA POPE Random | Accuracy88.94 | 36 | |
| Object Hallucination Evaluation | POPE GQA Popular | Accuracy83.54 | 30 | |
| Object Hallucination Probing | GQA POPE Random | Accuracy (GQA POPE)89.03 | 26 | |
| Hallucination Evaluation | MME Hallucination | Existence Score190.3 | 18 | |
| Object Hallucination Assessment | A-OKVQA POPE (Adversarial) | Accuracy0.8126 | 18 |