Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models

About

Large Vision-Language Models (LVLMs) exhibit outstanding performance on vision-language tasks but struggle with hallucination problems. Through in-depth analysis of LVLM activation patterns, we reveal two key findings: 1) truthfulness and visual perception capabilities predominantly engage different subsets of attention heads within the model architecture; and 2) truthfulness steering vectors vary significantly across different semantic contexts. Based on these observations, we propose Dynamic Multimodal Activation Steering, a training-free approach for hallucination mitigation. Our method constructs a semantic-based truthfulness steering vector database and computes visual perception steering vectors, enabling context-aware interventions during inference by dynamically selecting the most relevant steering vectors based on input semantic similarity and applying them to the most influential attention heads. We conduct comprehensive experiments across multiple models and datasets, demonstrating that our approach significantly enhances model performance, outperforming existing state-of-the-art methods.

Jianghao Yin, Qin Chen, Kedi Chen, Jie Zhou, Xingjiao Wu, Liang He• 2026

Related benchmarks

Task	Dataset	Result
Object Hallucination Evaluation	POPE	--	2056
Caption Hallucination Evaluation	CHAIR	CS Score48.8	122
Object Hallucination Evaluation	POPE Popular	Accuracy87.33	100
Multimodal Model Evaluation	MME	MME Score2.35e+3	80
Object Hallucination Evaluation	MSCOCO POPE	--	71
Multimodal Benchmarking	MMBench	MMBench Score83.2	60
Object Probing	POPE (average)	Accuracy86.81	52
Object Probing	POPE (Random)	Accuracy90.03	41
Object Probing	POPE Adversarial	Accuracy83.07	41
Multi-modal Hallucination Evaluation	AMBER	CHAIR7.1	28

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord