Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization

About

Direct Preference Optimization (DPO) has emerged as an effective approach for mitigating hallucination in Multimodal Large Language Models (MLLMs). Although existing methods have achieved significant progress by utilizing vision-oriented contrastive objectives for enhancing MLLMs' attention to visual inputs and hence reducing hallucination, they suffer from non-rigorous optimization objective function and indirect preference supervision. To address these limitations, we propose a Symmetric Multimodal Preference Optimization (SymMPO), which conducts symmetric preference learning with direct preference supervision (i.e., response pairs) for visual understanding enhancement, while maintaining rigorous theoretical alignment with standard DPO. In addition to conventional ordinal preference learning, SymMPO introduces a preference margin consistency loss to quantitatively regulate the preference gap between symmetric preference pairs. Comprehensive evaluation across five benchmarks demonstrate SymMPO's superior performance, validating its effectiveness in hallucination mitigation of MLLMs.

Wenqi Liu, Xuemeng Song, Jiaxi Li, Yinwei Wei, Na Zheng, Jianhua Yin, Liqiang Nie• 2025

Related benchmarks

Task	Dataset	Result
Visual Perception	BLINK	--	255
Visual Hallucination Evaluation	HallusionBench	--	156
Hallucination Evaluation	HallusionBench	--	153
Visual Perception and Reasoning	BLINK	Accuracy47.89	70
Multi-modal Hallucination Evaluation	AMBER	--	28
Robustness	R-Bench	R-Bench Dis Metric61.01	13
General Multimodal Evaluation	Macro-average of HallusionBench, AMBER, CRPE, R-Bench, and BLINK	Overall Score62.11	13
Multimodal Hallucination Evaluation	CRPE	Existence Score95.67	13
Compositional Reasoning and Perception Evaluation	CRPE	Exist Score92.47	13
Multimodal Hallucination Evaluation	R-Bench	Dis64.24	13

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord