Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TIM-PRM: Verifying multimodal reasoning with Tool-Integrated PRM

About

Multimodal Large Language Models (MLLMs) have achieved impressive performances in mathematical reasoning, yet they remain vulnerable to visual hallucinations and logical inconsistencies that standard outcome-based supervision fails to mitigate. While Process Reward Models (PRMs) promise step-by-step verification, current approaches typically operate as scalar scorers or generative critics that suffer from sycophancy, blindly validating the flawed hypotheses rather than grounding them in visual reality. To bridge this gap, we introduce TIM-PRM (Tool-Integrated Multimodal PRM), a novel agentic framework that transforms verification from a passive classification task into an active, tool-augmented investigation. TIM-PRM is trained to explicitly plan verification strategies and utilizes a mechanism of Independent Question Asking to query evidence via external tools, effectively decoupling verification from the reasoning context to eliminate confirmation bias. We instantiate this method by curating a high-quality dataset of tool-integrated verification trajectories. Extensive experiments on VisualProcessBench demonstrate that our 8B parameter model surpasses existing open-source multimodal PRMs, significantly outperforming much larger models like Qwen2.5-72B and InternVL-78B, while offering interpretable insights into the verification process.

Peng Kuang, Xiangxiang Wang, Wentao Liu, Jian Dong, Kaidi Xu• 2025

Related benchmarks

TaskDatasetResultRank
Step-wise VerificationWeMath
Macro F163.9
18
Step-wise VerificationMathVerse VO
Macro F161.9
18
Step-wise VerificationDynaMath
Macro F165.9
18
Step-wise VerificationMMMU, MathVision, MathVerse-VO, DynaMath, WeMath Overall
Macro F161.7
18
Step-wise VerificationMMMU
Macro F158.3
18
Step-wise VerificationMathVision
Macro F158.3
18
First Incorrect Step IdentificationMathVision
FISI F1 Score26.2
6
First Incorrect Step IdentificationMathVerse VO
FISI F1 Score29.6
6
First Incorrect Step IdentificationDynaMath
FISI F1 Score26.7
6
First Incorrect Step IdentificationWeMath
FISI F1 Score24.9
6
Showing 10 of 12 rows

Other info

Follow for update