Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Cross-Modal Attention Calibration for LVLM Hallucination Mitigation

About

Large vision-language models (LVLMs) have shown remarkable capabilities in visual-language understanding. Despite their success, LVLMs still suffer from generating hallucinations in complex generation tasks, leading to inconsistencies between visual inputs and generated content. To address this issue, some approaches have introduced inference-time interventions, such as contrastive decoding, to reduce overreliance on language priors. However, these approaches overlook hallucinations stemming from position bias and spurious inter-modality correlations. In this paper, we propose a Cross-Modal Attention Calibration (CMAC) method to mitigate hallucinations in LVLMs in a training-free manner. In this method, we design an Inter-Modality Decoding (IMD) module to alleviate hallucination by a novel contrastive decoding mechanism. IMD masks the value vectors associated with significant cross-modal attention weights as distortion, which addresses both uni-modality overreliance and misleading inter-modality correlations. Additionally, a Cross-Modal Position Calibration (CMPC) module shrinks the position gap of image tokens, alleviating the position bias in cross-modal attention. Experimental results on diverse hallucination benchmarks validate the superiority of our method over existing state-of-the-art techniques in reducing hallucinations for LVLM. Our code will be available at https://github.com/lijm48/IMCCD.

Jiaming Li, Jiacheng Zhang, Zequn Jie, Lin Ma, Guanbin Li• 2025

Related benchmarks

TaskDatasetResultRank
Object HallucinationPOPE Popular
F1 Score86.55
372
Object HallucinationPOPE Adversarial
Accuracy83.01
353
Object HallucinationPOPE (Random)
F1 Score89.5
324
Object Hallucination EvaluationMS-COCO POPE (Popular)
Accuracy86.9
158
Object Hallucination EvaluationCHAIR
CHAIRi Score55.3
154
Object Hallucination EvaluationPOPE (Random)
Accuracy89.33
152
Object Hallucination EvaluationMS-COCO POPE Random
Accuracy89.23
121
Object Hallucination EvaluationA-OKVQA POPE Popular
Accuracy85.73
76
Object Hallucination EvaluationMSCOCO POPE
Random Accuracy89.23
71
Object Hallucination EvaluationPOPE GQA Popular
Accuracy85.5
70
Showing 10 of 21 rows

Other info

Follow for update