Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mitigating Hallucination in Vision-Language Models through Barrier-Regulated Adaptive Closed-form Steering

About

Large vision-language models (LVLMs) often hallucinate objects that are not present in the input image, largely because visual grounding weakens as decoding progresses. Existing inference-time mitigation methods modify logits or hidden states throughout generation, but they suffer from three key limitations: they lack an explicit grounding objective, intervene even when the model is already well-grounded, and use fixed correction strengths that do not adapt to the severity of grounding failure. We propose BRACS (Barrier-Regulated Adaptive Closed-form Steering), a training-free steering framework that addresses these issues through barrier-regulated adaptive closed-form steering. BRACS monitors the model's own attention to measure visual grounding and applies corrections to the hidden states only when grounding deteriorates. The corrective update is computed analytically in closed form, requiring no training of auxiliary networks or model retraining. Experiments on LLaVA-1.5-7B and Qwen-VL-Chat show that BRACS consistently outperforms prior methods on hallucination benchmarks, reducing CHAIR$_s$ by 9.4 points and improving POPE F1 by 2.7 points, while matching or improving performance on four general multimodal benchmarks. BRACS also remains efficient, operating at 80% of greedy decoding throughput and achieving 1.3 times higher speed on average than the baselines.

Soumyadeep Jana, Pulkit Mittal, Sanasam Ranbir Singh• 2026

Related benchmarks

TaskDatasetResultRank
Multimodal UnderstandingMMBench--
847
Object Hallucination EvaluationMS-COCO (POPE Adversarial)
Accuracy84.8
190
Object Hallucination EvaluationMS-COCO POPE (Popular)
Accuracy87.43
158
Multimodal Hallucination EvaluationMMHal-Bench
Average Score3.72
129
Multimodal UnderstandingMME
Score1.88e+3
125
Object Hallucination EvaluationMS-COCO POPE Random
Accuracy89.11
121
Object Hallucination EvaluationMSCOCO 2014 (val)
CHAIRs47.2
81
Object Hallucination DetectionPOPE MS-COCO Overall
Accuracy86.83
12
Multimodal UnderstandingLLaVA-Bench
LLaVA-B Score62.8
12
Showing 9 of 9 rows

Other info

Follow for update