Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PDCR: Perception-Decomposed Confidence Reward for Vision-Language Reasoning

About

Reinforcement Learning with Verifiable Rewards (RLVR) traditionally relies on a sparse, outcome-based signal. Recent work shows that providing a fine-grained, model-intrinsic signal (rewarding the confidence growth in the ground-truth answer) effectively improves language reasoning training by providing step-level guidance without costly external models. While effective for unimodal text, we find that naively applying this global reward to vision-language (V-L) reasoning is a suboptimal strategy, as the task is a heterogeneous mix of sparse visual perception and dense textual reasoning. This global normalization creates mixture-induced signal degradation, where the training signal for visual steps is statistically distorted by the predominant textual steps. We propose Perception-Decomposed Confidence Reward (PDCR), a framework that solves this by aligning the reward structure with the task's heterogeneous nature. PDCR first performs an unsupervised skill decomposition, introducing a model-internal Visual Dependence Score to quantify visual reliance and applying a clustering algorithm to separate perception and reasoning steps. Based on this, PDCR computes a decomposed advantage by normalizing confidence gains within each skill cluster. This intra-cluster normalization provides a stable, correctly-scaled signal for both perception and reasoning. We demonstrate that PDCR outperforms the naive, global-reward formulation and sparse-reward baselines on key V-L reasoning benchmarks.

Hee Suk Yoon, Eunseop Yoon, Ji Woo Hong, SooHwan Eom, Gwanhyeong Koo, Mark Hasegawa-Johnson, Qi Dai, Chong Luo, Chang D. Yoo• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical Multimodal ReasoningMathVerse
Accuracy55
259
Multimodal Math ReasoningMathVision
Accuracy44.8
246
Visual Mathematical ReasoningMathVerse
Accuracy70.6
155
General Visual UnderstandingRealworldQA
Accuracy70.7
62
General Visual UnderstandingMMMU
Accuracy57.1
35
General Visual UnderstandingMMMU-Pro
Accuracy50.7
30
General Visual UnderstandingVisNumBench
Accuracy41.1
30
Hallucination DiagnosisHallusionBench--
15
Visual Math & HallucinationMathVision
Accuracy51
5
Visual Math & HallucinationHallusionBench
Accuracy76
5
Showing 10 of 10 rows

Other info

Follow for update