ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models

About

Recent Large Vision-Language Models (LVLMs) have introduced a new paradigm for understanding and reasoning about image input through textual responses. Although they have achieved remarkable performance across a range of multi-modal tasks, they face the persistent challenge of hallucination, which introduces practical weaknesses and raises concerns about their reliable deployment in real-world applications. Existing work has explored contrastive decoding approaches to mitigate this issue, where the output of the original LVLM is compared and contrasted with that of a perturbed version. However, these methods require two or more queries that slow down LVLM response generation, making them less suitable for real-time applications. To overcome this limitation, we propose ONLY, a training-free decoding approach that requires only a single query and a one-layer intervention during decoding, enabling efficient real-time deployment. Specifically, we enhance textual outputs by selectively amplifying crucial textual information using a text-to-visual entropy ratio for each token. Extensive experimental results demonstrate that our proposed ONLY consistently outperforms state-of-the-art methods across various benchmarks while requiring minimal implementation effort and computational cost. Code is available at https://github.com/zifuwan/ONLY.

Zifu Wan, Ce Zhang, Silong Yong, Martin Q. Ma, Simon Stepputtis, Louis-Philippe Morency, Deva Ramanan, Katia Sycara, Yaqi Xie• 2025

Related benchmarks

Task	Dataset	Result
Object Hallucination Evaluation	POPE	Accuracy85.1	2019
Multimodal Understanding	MMBench	Accuracy64.87	847
Science Question Answering	ScienceQA	Accuracy67.23	791
Multimodal Evaluation	MME	--	727
Multimodal Understanding	MM-Vet	MM-Vet Score32.5	631
Multimodal Understanding	MMStar	Accuracy32.27	407
Hallucination Evaluation	CHAIR	CHAIR_s52.2	393
Multimodal Capability Evaluation	MM-Vet	Score46.97	393
Object Hallucination	POPE Popular	F1 Score87.79	372
Object Hallucination	POPE Adversarial	Accuracy86.93	353

Showing 10 of 38 rows

Other info

Follow for update

@wizwand_team Discord