ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
About
Recent Large Vision-Language Models (LVLMs) have introduced a new paradigm for understanding and reasoning about image input through textual responses. Although they have achieved remarkable performance across a range of multi-modal tasks, they face the persistent challenge of hallucination, which introduces practical weaknesses and raises concerns about their reliable deployment in real-world applications. Existing work has explored contrastive decoding approaches to mitigate this issue, where the output of the original LVLM is compared and contrasted with that of a perturbed version. However, these methods require two or more queries that slow down LVLM response generation, making them less suitable for real-time applications. To overcome this limitation, we propose ONLY, a training-free decoding approach that requires only a single query and a one-layer intervention during decoding, enabling efficient real-time deployment. Specifically, we enhance textual outputs by selectively amplifying crucial textual information using a text-to-visual entropy ratio for each token. Extensive experimental results demonstrate that our proposed ONLY consistently outperforms state-of-the-art methods across various benchmarks while requiring minimal implementation effort and computational cost. Code is available at https://github.com/zifuwan/ONLY.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Hallucination Evaluation | POPE | Accuracy85.1 | 935 | |
| Multimodal Evaluation | MME | -- | 557 | |
| Multimodal Understanding | MM-Vet | MM-Vet Score32.5 | 418 | |
| Multimodal Understanding | MMBench | Accuracy64.87 | 367 | |
| Multimodal Capability Evaluation | MM-Vet | Score46.97 | 282 | |
| Science Question Answering | ScienceQA | Accuracy67.23 | 229 | |
| Object Hallucination | POPE (Random) | F1 Score89.09 | 200 | |
| Multimodal Understanding | MMStar | Accuracy32.27 | 197 | |
| Object Hallucination | POPE Adversarial | Accuracy86.93 | 196 | |
| Object Hallucination | POPE Popular | F1 Score87.79 | 188 |