Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization

About

Omni-modal large language models (omni LLMs) have recently achieved strong performance across audiovisual understanding tasks, yet they remain highly susceptible to cross-modal hallucinations arising from spurious correlations and dominant language priors. In this work, we propose Modality-Decoupled Direct Preference Optimization (MoD-DPO), a simple and effective framework for improving modality grounding in omni LLMs. MoD-DPO introduces modality-aware regularization terms that explicitly enforce invariance to corruptions in irrelevant modalities and sensitivity to perturbations in relevant modalities, thereby reducing unintended cross-modal interactions. To further mitigate over-reliance on textual priors, we incorporate a language-prior debiasing penalty that discourages hallucination-prone text-only responses. Extensive experiments across multiple audiovisual hallucination benchmarks demonstrate that MoD-DPO consistently improves perception accuracy and hallucination resistance, outperforming previous preference optimization baselines under similar training budgets. Our findings underscore the importance of modality-faithful alignment and demonstrate a scalable path toward more reliable and resilient multimodal foundation models.

Ashutosh Chaubey, Jiacheng Pang, Mohammad Soleymani• 2026

Related benchmarks

TaskDatasetResultRank
Video UnderstandingMVBench--
425
Audio-visual understandingDailyOmni
Average Score53.82
69
Video-driven Audio HallucinationAVHBench
Accuracy83.4
27
Cross-modal hallucination evaluationAVHBench
Overall Accuracy88.19
22
Audiovisual MatchingAVHBench
Accuracy69.68
14
Cross-modal Hallucination DetectionCurse of Multi-Modalities (CMM) 1.0 (test)
VL Precision (pa)92.5
14
Audio UnderstandingMMAU audio
Sound Score72.08
10
Multi-turn omni-modal dialogOmniDialog audiovisual task
OmniDialog Score85.86
4
Showing 8 of 8 rows

Other info

Follow for update