Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Breaking the Illusion: When Positive Meets Negative in Multimodal Decoding

About

Vision-Language Models (VLMs) are frequently undermined by object hallucination, generating content that contradicts visual reality, due to an over-reliance on linguistic priors. We introduce Positive-and-Negative Decoding (PND), a training-free inference framework that intervenes directly in the decoding process to enforce visual fidelity. PND is motivated by our finding of an attention imbalance in VLMs, where visual features are under-weighted. Our framework introduces a dual-path contrast: a positive path that amplifies visual evidence and a negative path that constructs counterfactuals to penalize prior-dominant generation. By contrasting outputs from both paths during decoding, PND steers generation toward visually grounded results. Experiments on POPE, MME, and CHAIR demonstrate state-of-the-art performance without retraining.

Yubo Jiang, Yitong An, Xin Yang, Abudukelimu Wuerkaixi, Xuxin Cheng, Fengying Xie, Zhiguo Jiang, Cao Liu, Ke Zeng, Haopeng Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Object Hallucination EvaluationMS-COCO (POPE Adversarial)
Accuracy87.26
190
Object Hallucination EvaluationMS-COCO POPE (Popular)
Accuracy86.1
158
Object Hallucination EvaluationMS-COCO POPE Random
Accuracy87.63
121
Image CaptioningCHAIR
CHAIR_S46
32
Multimodal PerceptionMME
Existence Score195
26
Showing 5 of 5 rows

Other info

Follow for update