Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
About
Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem", in which the models generate textual descriptions that inaccurately depict or entirely fabricate content from associated images. This paper introduces a novel solution, Hallucination-Aware Direct Preference Optimization (HA-DPO), which reframes the hallucination problem as a preference selection task. The model is trained to favor the non-hallucinating response when presented with two responses of the same image (one accurate and one hallucinatory). Furthermore, this paper proposes an efficient pipeline for constructing positive~(non-hallucinatory) and negative~(hallucinatory) sample pairs, ensuring a high-quality, style-consistent dataset for robust preference learning. When applied to three mainstream multimodal models, HA-DPO significantly reduced hallucination issues and amplified the models' generalization capabilities. Notably, the MiniGPT-4 model, when enhanced with HA-DPO, demonstrated a substantial improvement: POPE accuracy rose from 51.13% to 86.13% (an absolute improvement of 35%), and the MME score surged from 932.00 to 1326.46 (a relative improvement of 42.32%). The codes, models, and datasets are made accessible at https://opendatalab.github.io/HA-DPO.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Hallucination Evaluation | POPE | -- | 2019 | |
| Visual Question Answering | VizWiz | Accuracy53.9 | 1820 | |
| Visual Question Answering | TextVQA | Accuracy56.7 | 1453 | |
| Visual Question Answering | VQA v2 | Accuracy77.6 | 1429 | |
| Text-based Visual Question Answering | TextVQA | Accuracy58 | 962 | |
| Science Question Answering | ScienceQA | Accuracy68.1 | 791 | |
| Multimodal Evaluation | MME | -- | 727 | |
| Hallucination Evaluation | CHAIR | CHAIR_s46.5 | 393 | |
| Multimodal Capability Evaluation | MM-Vet | Score30.9 | 393 | |
| Hallucination Evaluation | MMHal-Bench | MMHal Score1.98 | 306 |