Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Problem Solving on AIME '25 (test)
Loading...
22.71
Avg@16 Acc
PRM-CoT (Process-Aware)
6.0284
10.3592
14.69
19.0208
Dec 2, 2025
Avg@16 Acc
Pass@1 Acc
Pass@8 Acc
Pass@16 Acc
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg@16 Acc
Pass@1 Acc
Pass@8 Acc
Pass@16 Acc
PRM-CoT (Process-Aware)
Base Policy=Qwen2.5-Ma...
2025.12
22.71
20
36.67
40
RLVR (Ground Truth)
Base Policy=Qwen2.5-Ma...
2025.12
15.83
16.67
30
33.33
PRM (Process-Aware)
Base Policy=Qwen2.5-Ma...
2025.12
12.92
16.67
30
33.33
SFT (Baseline)
Base Policy=Qwen2.5-Ma...
2025.12
6.67
3.3
16.67
23.33
Feedback
Search any
task
Search any
task