Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models

About

We present Athena-PRM, a multimodal process reward model (PRM) designed to evaluate the reward score for each step in solving complex reasoning problems. Developing high-performance PRMs typically demands significant time and financial investment, primarily due to the necessity for step-level annotations of reasoning steps. Conventional automated labeling methods, such as Monte Carlo estimation, often produce noisy labels and incur substantial computational costs. To efficiently generate high-quality process-labeled data, we propose leveraging prediction consistency between weak and strong completers as a criterion for identifying reliable process labels. Remarkably, Athena-PRM demonstrates outstanding effectiveness across various scenarios and benchmarks with just 5,000 samples. Furthermore, we also develop two effective strategies to improve the performance of PRMs: ORM initialization and up-sampling for negative data. We validate our approach in three specific scenarios: verification for test time scaling, direct evaluation of reasoning step correctness, and reward ranked fine-tuning. Our Athena-PRM consistently achieves superior performance across multiple benchmarks and scenarios. Notably, when using Qwen2.5-VL-7B as the policy model, Athena-PRM enhances performance by 10.2 points on WeMath and 7.1 points on MathVista for test time scaling. Furthermore, Athena-PRM sets the state-of-the-art (SoTA) results in VisualProcessBench and outperforms the previous SoTA by 3.9 F1-score, showcasing its robust capability to accurately assess the correctness of the reasoning step. Additionally, utilizing Athena-PRM as the reward model, we develop Athena-7B with reward ranked fine-tuning and outperforms baseline with a significant margin on five benchmarks.

Shuai Wang, Zhenhua Liu, Jiaheng Wei, Xuanwu Yin, Dong Li, Emad Barsoum• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy (Acc)94.8
337
Mathematical Multimodal ReasoningMathVerse
Accuracy45.7
259
Mathematical Multimodal ReasoningMathVista
Accuracy71.4
258
Multimodal Math ReasoningMathVision
Accuracy25.7
246
Multimodal Math ReasoningWeMath
Accuracy43
211
Multimodal ReasoningMMMU
Accuracy75.8
208
Multimodal ReasoningWeMath
Accuracy58.7
171
Multimodal ReasoningMathVision
Accuracy44.8
162
Multimodal ReasoningLogicVista
Accuracy60.9
147
Multimodal ReasoningMathVerse
Accuracy54.6
130
Showing 10 of 17 rows

Other info

Follow for update