Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SAKED: Mitigating Hallucination in Large Vision-Language Models via Stability-Aware Knowledge Enhanced Decoding

About

Hallucinations in Large Vision-Language Models (LVLMs) pose significant security and reliability risks in real-world applications. Inspired by the observation that humans are more error-prone when uncertain or hesitant, we investigate how instability in a model 's internal knowledge contributes to LVLM hallucinations. We conduct extensive empirical analyses from three perspectives, namely attention heads, model layers, and decoding tokens, and identify three key hallucination patterns: (i) visual activation drift across attention heads, (ii) pronounced knowledge fluctuations across layers, and (iii) visual focus distraction between neighboring output tokens. Building on these findings, we propose Stability-Aware Knowledge-Enhanced Decoding (SAKED), which introduces a layer-wise Knowledge Stability Score (KSS) to quantify knowledge stability throughout the model. By contrasting the most stability-aware and stability-agnostic layers, SAKED suppresses decoding noise and dynamically leverages the most reliable internal knowledge for faithful token generation. Moreover, SAKED is training-free and can be seamlessly integrated into different architectures. Extensive experiments demonstrate that SAKED achieves state-of-the-art performance for hallucination mitigation on various models, tasks, and benchmarks.

Zhaoxu Li, Chenqi Kong, Peijun Bao, Song Xia, Yi Tu, Yi Yu, Xinghao Jiang, Xudong Jiang• 2026

Related benchmarks

TaskDatasetResultRank
Hallucination EvaluationPOPE--
132
Hallucination EvaluationAMBER
F1 Score86.5
71
Hallucination EvaluationCHAIR MSCOCO 2014 (val)
CHAIRi14
39
Language Quality EvaluationCHAIR benchmark (test)
BLEU-119.2
16
Showing 4 of 4 rows

Other info

Follow for update