| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning Process Evaluation | PROCESSBENCH | GSM8K Accuracy82.9 | 28 | |
| Reasoning | ProcessBench | Accuracy69.85 | 20 | |
| Process Verification | ProcessBench Without Standard Answers | Precise Accuracy71.9 | 18 | |
| Process Verification | ProcessBench With Standard Answers | Precise Accuracy78.9 | 18 | |
| Process-level verification | ProcessBench Aggregate (test) | Avg F156.5 | 13 | |
| Step-level Correctness Discrimination | ProcessBench GSM8K MATH Olympiad Bench Omni Math | GSM8K Error Rate0.242 | 12 | |
| Process Reward Model Assessment | PROCESSBENCH | GSM8K Accuracy70.8 | 11 |