Cross-Modal Attention Calibration for LVLM Hallucination Mitigation

About

Large vision-language models (LVLMs) have shown remarkable capabilities in visual-language understanding. Despite their success, LVLMs still suffer from generating hallucinations in complex generation tasks, leading to inconsistencies between visual inputs and generated content. To address this issue, some approaches have introduced inference-time interventions, such as contrastive decoding, to reduce overreliance on language priors. However, these approaches overlook hallucinations stemming from position bias and spurious inter-modality correlations. In this paper, we propose a Cross-Modal Attention Calibration (CMAC) method to mitigate hallucinations in LVLMs in a training-free manner. In this method, we design an Inter-Modality Decoding (IMD) module to alleviate hallucination by a novel contrastive decoding mechanism. IMD masks the value vectors associated with significant cross-modal attention weights as distortion, which addresses both uni-modality overreliance and misleading inter-modality correlations. Additionally, a Cross-Modal Position Calibration (CMPC) module shrinks the position gap of image tokens, alleviating the position bias in cross-modal attention. Experimental results on diverse hallucination benchmarks validate the superiority of our method over existing state-of-the-art techniques in reducing hallucinations for LVLM. Our code will be available at https://github.com/lijm48/IMCCD.

Jiaming Li, Jiacheng Zhang, Zequn Jie, Lin Ma, Guanbin Li• 2025

Related benchmarks

Task	Dataset	Result
Object Hallucination	POPE Popular	Accuracy86.92	406
Object Hallucination	POPE Adversarial	Accuracy83.01	367
Object Hallucination	POPE (Random)	F1 Score89.5	338
Object Hallucination Evaluation	CHAIR	CHAIRi Score55.3	174
Object Hallucination Evaluation	MS-COCO POPE (Popular)	Accuracy86.9	158
Object Hallucination Evaluation	POPE (Random)	Accuracy89.33	152
Object Hallucination Evaluation	MS-COCO POPE Random	Accuracy89.23	121
Object Hallucination Evaluation	A-OKVQA POPE Popular	Accuracy85.73	91
Object Hallucination Evaluation	A-OKVQA POPE Random	Accuracy88.53	75
Object Hallucination Evaluation	MSCOCO POPE	Random Accuracy89.23	71

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord