SCOPE: Compress Mathematical Reasoning Steps for Efficient Automated Process Annotation

About

Process Reward Models (PRMs) have demonstrated promising results in mathematical reasoning, but existing process annotation approaches, whether through human annotations or Monte Carlo simulations, remain computationally expensive. In this paper, we introduce Step COmpression for Process Estimation (SCOPE), a novel compression-based approach that significantly reduces annotation costs. We first translate natural language reasoning steps into code and normalize them through Abstract Syntax Tree, then merge equivalent steps to construct a prefix tree. Unlike simulation-based methods that waste numerous samples on estimation, SCOPE leverages a compression-based prefix tree where each root-to-leaf path serves as a training sample, reducing the complexity from $O(NMK)$ to $O(N)$. We construct a large-scale dataset containing 196K samples with only 5% of the computational resources required by previous methods. Empirical results demonstrate that PRMs trained on our dataset consistently outperform existing automated annotation approaches on both Best-of-N strategy and ProcessBench.

Huimin Xu, Xin Mao, Feng-Lin Li, Xiaobao Wu, Wang Chen, Wei Zhang, Anh Tuan Luu• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH	Accuracy87.7	882
Mathematical Reasoning	GSM8K	Accuracy (GSM8K)96.7	358
Mathematical Reasoning	CollegeMATH	Accuracy48.3	327
Mathematical Reasoning	Minerva Math	Accuracy38.2	228
Mathematical Reasoning	Olympiad Bench	Pass@1 Accuracy46.8	115
Mathematical Reasoning	GaoKao En 2023	Pass@1 Accuracy72.2	66
Process-level verification	MATH ProcessBench (test)	Error Rate15.9	26
Process-level verification	ProcessBench Aggregate (test)	Avg F154.4	13
Process-level verification	OlympiadBench ProcessBench (test)	Error23.8	13
Process-level verification	GSM8K ProcessBench (test)	Error Rate53.6	13

Showing 10 of 10 rows

Other info

Code

Follow for update

@wizwand_team Discord