Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models

About

We present Athena-PRM, a multimodal process reward model (PRM) designed to evaluate the reward score for each step in solving complex reasoning problems. Developing high-performance PRMs typically demands significant time and financial investment, primarily due to the necessity for step-level annotations of reasoning steps. Conventional automated labeling methods, such as Monte Carlo estimation, often produce noisy labels and incur substantial computational costs. To efficiently generate high-quality process-labeled data, we propose leveraging prediction consistency between weak and strong completers as a criterion for identifying reliable process labels. Remarkably, Athena-PRM demonstrates outstanding effectiveness across various scenarios and benchmarks with just 5,000 samples. Furthermore, we also develop two effective strategies to improve the performance of PRMs: ORM initialization and up-sampling for negative data. We validate our approach in three specific scenarios: verification for test time scaling, direct evaluation of reasoning step correctness, and reward ranked fine-tuning. Our Athena-PRM consistently achieves superior performance across multiple benchmarks and scenarios. Notably, when using Qwen2.5-VL-7B as the policy model, Athena-PRM enhances performance by 10.2 points on WeMath and 7.1 points on MathVista for test time scaling. Furthermore, Athena-PRM sets the state-of-the-art (SoTA) results in VisualProcessBench and outperforms the previous SoTA by 3.9 F1-score, showcasing its robust capability to accurately assess the correctness of the reasoning step. Additionally, utilizing Athena-PRM as the reward model, we develop Athena-7B with reward ranked fine-tuning and outperforms baseline with a significant margin on five benchmarks.

Shuai Wang, Zhenhua Liu, Jiaheng Wei, Xuanwu Yin, Dong Li, Emad Barsoum• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy (Acc)94.8	352
Mathematical Multimodal Reasoning	MathVista	Accuracy71.4	276
Multimodal Math Reasoning	MathVision	Accuracy25.7	263
Mathematical Multimodal Reasoning	MathVerse	Accuracy45.7	259
Multimodal Math Reasoning	WeMath	Accuracy43	228
Multimodal Reasoning	MMMU	Accuracy75.8	220
Multimodal Reasoning	WeMath	Accuracy58.7	199
Multimodal Reasoning	LogicVista	Accuracy60.9	172
Multimodal Reasoning	MathVision	Accuracy44.8	162
Mathematical Reasoning	DynaMath	Accuracy21.9	146

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord