HTDC: Hesitation-Triggered Differential Calibration for Mitigating Hallucination in Large Vision-Language Models

About

Large vision-language models (LVLMs) achieve strong multimodal performance, but still suffer from hallucinations caused by unstable visual grounding and over-reliance on language priors. Existing training-free decoding methods typically apply calibration at every decoding step, introducing unnecessary computation and potentially disrupting stable predictions. We address this problem by identifying layer-wise hesitation, a simple signal of grounding instability reflected by fluctuations in token preference across intermediate layers. Based on this observation, we propose Hesitation-Triggered Differential Calibration (HTDC), a training-free decoding framework that preserves standard full-branch inference and activates calibration only at hesitation-prone steps. When triggered, HTDC contrasts the full branch with two lightweight probes, a visual-nullification probe and a semantic-nullification probe, to suppress hallucination-prone candidates while avoiding unnecessary intervention on stable steps. Experiments on representative hallucination benchmarks show that HTDC consistently reduces hallucinations while maintaining strong task accuracy, achieving a favorable trade-off between effectiveness and computational overhead.

Xinyun Liu• 2026

Related benchmarks

Task	Dataset	Result
Object Hallucination Evaluation	MSCOCO POPE	Random Accuracy91.24	71
Object Hallucination Mitigation	CHAIR	CHAIRs Score11.6	22
Large Vision-Language Model Evaluation	MME	Perception Score1.71e+3	12
Object Hallucination Evaluation	GQA POPE	Accuracy (Random)92.87	12

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord