Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Cognitive Pivot Points and Visual Anchoring: Unveiling and Rectifying Hallucinations in Multimodal Reasoning Models

About

Multimodal Large Reasoning Models (MLRMs) have achieved remarkable strides in visual reasoning through test time compute scaling, yet long chain reasoning remains prone to hallucinations. We identify a concerning phenomenon termed the Reasoning Vision Truth Disconnect (RVTD): hallucinations are strongly correlated with cognitive bifurcation points that often exhibit high entropy states. We attribute this vulnerability to a breakdown in visual semantic anchoring, localized within the network's intermediate layers; specifically, during these high uncertainty transitions, the model fails to query visual evidence, reverting instead to language priors. Consequently, we advocate a shift from solely outcome level supervision to augmenting it with fine grained internal attention guidance. To this end, we propose V-STAR (Visual Structural Training with Attention Reinforcement), a lightweight, holistic training paradigm designed to internalize visually aware reasoning capabilities. Central to our approach is the Hierarchical Visual Attention Reward (HVAR), integrated within the GRPO framework. Upon detecting high entropy states, this mechanism dynamically incentivizes visual attention across critical intermediate layers, thereby anchoring the reasoning process back to the visual input. Furthermore, we introduce the Forced Reflection Mechanism (FRM), a trajectory editing strategy that disrupts cognitive inertia by triggering reflection around high entropy cognitive bifurcation points and encouraging verification of subsequent steps against the visual input, thereby translating external debiasing interventions into an intrinsic capability for hallucination mitigation.

Zhe Qian, Yanbiao Ma, Zhuohan Ouyang, Zhonghua Wang, Zhongxing Xu, Fei Luo, Xinyu Liu, Zongyuan Ge, Yike Guo, Jungong Han• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMathVista
Accuracy74.9
382
Object HallucinationPOPE Popular
F1 Score86.6
372
Object HallucinationPOPE Adversarial
Accuracy87.6
353
Object HallucinationPOPE (Random)
F1 Score88
324
Mathematical ReasoningMathVerse
Accuracy54.6
183
Mathematical ReasoningMathVision
Accuracy33.7
168
Hallucination EvaluationHallusionBench
Accuracy62.3
153
Mathematical ReasoningMMATH
Accuracy41.1
36
Mathematical ReasoningGeometry3K
Accuracy69.4
26
Mathematical ReasoningVisuLogic
Accuracy (VisuLogic Math)30.6
24
Showing 10 of 29 rows

Other info

Follow for update