Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Revis: Sparse Latent Steering to Mitigate Object Hallucination in Large Vision-Language Models

About

Despite the advanced capabilities of Large Vision-Language Models (LVLMs), they frequently suffer from object hallucination. One reason is that visual features and pretrained textual representations often become intertwined in the deeper network layers. To address this, we propose REVIS, a training-free framework designed to explicitly re-activate this suppressed visual information. Rooted in latent space geometry, REVIS extracts the pure visual information vector via orthogonal projection and employs a calibrated strategy to perform sparse intervention only at the precise depth where suppression occurs. This surgical approach effectively restores visual information with minimal computational cost. Empirical evaluations on standard benchmarks demonstrate that REVIS reduces object hallucination rates by approximately 19% compared to state-of-the-art baselines, while preserving general reasoning capabilities.

Jialin Wu, Wei Shi, Han Shen, Peigui Qi, Kunsheng Tang, Zhicong Huang, Binghao Wang, Zhou Yang• 2026

Related benchmarks

TaskDatasetResultRank
Object Hallucination EvaluationPOPE--
1455
Multimodal Capability EvaluationMM-Vet
Score47.48
345
Object HallucinationPOPE Adversarial
Accuracy87.8
288
Object HallucinationPOPE (Random)
F1 Score91.43
285
Object HallucinationPOPE Popular
F1 Score89.91
273
Hallucination EvaluationCHAIR
CHAIR_s30
252
Vision-Language UnderstandingMM-Vet
Total Score72.16
43
Large Multi-modal Model EvaluationMME
Perception Score1.51e+3
22
Vision-Language EvaluationMME (full)
Perception Score1.72e+3
7
Showing 9 of 9 rows

Other info

Follow for update