Mitigating Hallucination in Vision-Language Models through Barrier-Regulated Adaptive Closed-form Steering

About

Large vision-language models (LVLMs) often hallucinate objects that are not present in the input image, largely because visual grounding weakens as decoding progresses. Existing inference-time mitigation methods modify logits or hidden states throughout generation, but they suffer from three key limitations: they lack an explicit grounding objective, intervene even when the model is already well-grounded, and use fixed correction strengths that do not adapt to the severity of grounding failure. We propose BRACS (Barrier-Regulated Adaptive Closed-form Steering), a training-free steering framework that addresses these issues through barrier-regulated adaptive closed-form steering. BRACS monitors the model's own attention to measure visual grounding and applies corrections to the hidden states only when grounding deteriorates. The corrective update is computed analytically in closed form, requiring no training of auxiliary networks or model retraining. Experiments on LLaVA-1.5-7B and Qwen-VL-Chat show that BRACS consistently outperforms prior methods on hallucination benchmarks, reducing CHAIR$_s$ by 9.4 points and improving POPE F1 by 2.7 points, while matching or improving performance on four general multimodal benchmarks. BRACS also remains efficient, operating at 80% of greedy decoding throughput and achieving 1.3 times higher speed on average than the baselines.

Soumyadeep Jana, Pulkit Mittal, Sanasam Ranbir Singh• 2026

Related benchmarks

Task	Dataset	Result
Multimodal Understanding	MMBench	--	887
Object Hallucination Evaluation	MS-COCO (POPE Adversarial)	Accuracy84.8	205
Object Hallucination Evaluation	MS-COCO POPE (Popular)	Accuracy87.43	158
Multimodal Understanding	MME	Score1.88e+3	150
Multimodal Hallucination Evaluation	MMHal-Bench	Average Score3.72	140
Object Hallucination Evaluation	MS-COCO POPE Random	Accuracy89.11	121
Object Hallucination Evaluation	MSCOCO 2014 (val)	CHAIRs47.2	81
Object Hallucination Detection	POPE MS-COCO Overall	Accuracy86.83	12
Multimodal Understanding	LLaVA-Bench	LLaVA-B Score62.8	12

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord