Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models

About

Recent Large Vision-Language Models (LVLMs) have introduced a new paradigm for understanding and reasoning about image input through textual responses. Although they have achieved remarkable performance across a range of multi-modal tasks, they face the persistent challenge of hallucination, which introduces practical weaknesses and raises concerns about their reliable deployment in real-world applications. Existing work has explored contrastive decoding approaches to mitigate this issue, where the output of the original LVLM is compared and contrasted with that of a perturbed version. However, these methods require two or more queries that slow down LVLM response generation, making them less suitable for real-time applications. To overcome this limitation, we propose ONLY, a training-free decoding approach that requires only a single query and a one-layer intervention during decoding, enabling efficient real-time deployment. Specifically, we enhance textual outputs by selectively amplifying crucial textual information using a text-to-visual entropy ratio for each token. Extensive experimental results demonstrate that our proposed ONLY consistently outperforms state-of-the-art methods across various benchmarks while requiring minimal implementation effort and computational cost. Code is available at https://github.com/zifuwan/ONLY.

Zifu Wan, Ce Zhang, Silong Yong, Martin Q. Ma, Simon Stepputtis, Louis-Philippe Morency, Deva Ramanan, Katia Sycara, Yaqi Xie• 2025

Related benchmarks

TaskDatasetResultRank
Object Hallucination EvaluationPOPE
Accuracy85.1
1455
Multimodal EvaluationMME--
658
Multimodal UnderstandingMMBench
Accuracy64.87
637
Multimodal UnderstandingMM-Vet
MM-Vet Score32.5
531
Science Question AnsweringScienceQA
Accuracy67.23
502
Multimodal Capability EvaluationMM-Vet
Score46.97
345
Multimodal UnderstandingMMStar
Accuracy32.27
324
Object HallucinationPOPE Adversarial
Accuracy86.93
288
Object HallucinationPOPE (Random)
F1 Score89.09
285
Object HallucinationPOPE Popular
F1 Score87.79
273
Showing 10 of 32 rows

Other info

Follow for update