Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision

About

While Multimodal Large Language Models (MLLMs) have achieved impressive progress in vision-language understanding, they still struggle with complex multi-step reasoning, often producing logically inconsistent or partially correct solutions. A key limitation lies in the lack of fine-grained supervision over intermediate reasoning steps. To address this, we propose MM-PRM, a process reward model trained within a fully automated, scalable framework. We first build MM-Policy, a strong multimodal model trained on diverse mathematical reasoning data. Then, we construct MM-K12, a curated dataset of 10,000 multimodal math problems with verifiable answers, which serves as seed data. Leveraging a Monte Carlo Tree Search (MCTS)-based pipeline, we generate over 700k step-level annotations without human labeling. The resulting PRM is used to score candidate reasoning paths in the Best-of-N inference setup and achieves significant improvements across both in-domain (MM-K12 test set) and out-of-domain (OlympiadBench, MathVista, etc.) benchmarks. Further analysis confirms the effectiveness of soft labels, smaller learning rates, and path diversity in optimizing PRM performance. MM-PRM demonstrates that process supervision is a powerful tool for enhancing the logical robustness of multimodal reasoning systems. We release all our codes and data at https://github.com/ModalMinds/MM-PRM.

Lingxiao Du, Fanqing Meng, Zongkai Liu, Zhixiang Zhou, Ping Luo, Qiaosheng Zhang, Wenqi Shao• 2025

Related benchmarks

TaskDatasetResultRank
Step-wise VerificationDynaMath
Macro F158.1
18
Step-wise VerificationMathVision
Macro F155.4
18
Step-wise VerificationMathVerse VO
Macro F154.9
18
Step-wise VerificationMMMU, MathVision, MathVerse-VO, DynaMath, WeMath Overall
Macro F155.5
18
Step-wise VerificationWeMath
Macro F156.5
18
Step-wise VerificationMMMU
Macro F151.2
18
First Incorrect Step IdentificationMathVision
FISI F1 Score15.2
6
First Incorrect Step IdentificationMathVerse VO
FISI F1 Score15.3
6
First Incorrect Step IdentificationDynaMath
FISI F1 Score16.4
6
First Incorrect Step IdentificationWeMath
FISI F1 Score12.2
6
Showing 10 of 12 rows

Other info

Follow for update