Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MHSA: A Lightweight Framework for Mitigating Hallucinations via Steered Attention in LVLMs

About

Large vision-language models (LVLMs) have achieved remarkable performance across diverse multimodal tasks, yet they continue to suffer from hallucinations, generating content that is inconsistent with the visual input. Prior work DHCP (Detecting Hallucinations by Cross-modal Attention Pattern) has explored hallucination detection from the perspective of cross-modal attention, but does not address hallucination mitigation. In this paper, we propose MHSA (Mitigating Hallucinations via Steered Attention), a lightweight framework that mitigates hallucinations by learning to correct cross-modal attention patterns in LVLMs. MHSA trains a simple three-layer MLP generator to produce corrected attention, guided by supervisory signals from the DHCP discriminator and the LVLM itself. During inference, MHSA mitigates both discriminative and generative hallucinations across various datasets and LVLMs by simply replacing the original cross-modal attention with the corrected one, without modifying any LVLM parameters. By extending cross-modal attention mechanisms from hallucination detection to hallucination mitigation, MHSA offers a novel perspective on hallucination research in LVLMs and helps enhance their reliability.

Wei Ding, Yilin Li, Yudong Zhang, Ruobing Xie, Xingwu Sun, Jiansheng Chen, Yu Wang• 2026

Related benchmarks

TaskDatasetResultRank
Object Hallucination EvaluationPOPE MSCOCO
F1 Score93.97
60
Object Hallucination EvaluationMSCOCO
Accuracy93.87
43
Image CaptioningMSCOCO
CHAIRs18
26
Object Hallucination EvaluationPOPE Objects365
F1 Score91.47
5
Hallucination EvaluationCOCO POPE (test)
F1 Score93.48
3
Hallucination EvaluationObjects365 POPE (test)
F1 Score91.16
3
Hallucination EvaluationObjects365
Accuracy90.53
2
Hallucination EvaluationOpenImages V7
Accuracy83.73
2
Object Hallucination EvaluationPOPE Objects365 v1 (test)
Accuracy91.23
2
Object Hallucination EvaluationPOPE ImageNet v1 (test)
Accuracy86.67
2
Showing 10 of 12 rows

Other info

Follow for update