Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

About

Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem", in which the models generate textual descriptions that inaccurately depict or entirely fabricate content from associated images. This paper introduces a novel solution, Hallucination-Aware Direct Preference Optimization (HA-DPO), which reframes the hallucination problem as a preference selection task. The model is trained to favor the non-hallucinating response when presented with two responses of the same image (one accurate and one hallucinatory). Furthermore, this paper proposes an efficient pipeline for constructing positive~(non-hallucinatory) and negative~(hallucinatory) sample pairs, ensuring a high-quality, style-consistent dataset for robust preference learning. When applied to three mainstream multimodal models, HA-DPO significantly reduced hallucination issues and amplified the models' generalization capabilities. Notably, the MiniGPT-4 model, when enhanced with HA-DPO, demonstrated a substantial improvement: POPE accuracy rose from 51.13% to 86.13% (an absolute improvement of 35%), and the MME score surged from 932.00 to 1326.46 (a relative improvement of 42.32%). The codes, models, and datasets are made accessible at https://opendatalab.github.io/HA-DPO.

Zhiyuan Zhao, Bin Wang, Linke Ouyang, Xiaoyi Dong, Jiaqi Wang, Conghui He• 2023

Related benchmarks

TaskDatasetResultRank
Object Hallucination EvaluationPOPE--
2019
Visual Question AnsweringVizWiz
Accuracy53.9
1820
Visual Question AnsweringTextVQA
Accuracy56.7
1453
Visual Question AnsweringVQA v2
Accuracy77.6
1429
Text-based Visual Question AnsweringTextVQA
Accuracy58
962
Science Question AnsweringScienceQA
Accuracy68.1
791
Multimodal EvaluationMME--
727
Hallucination EvaluationCHAIR
CHAIR_s46.5
393
Multimodal Capability EvaluationMM-Vet
Score30.9
393
Hallucination EvaluationMMHal-Bench
MMHal Score1.98
306
Showing 10 of 48 rows

Other info

Follow for update