Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Process-level Reward Modeling on PROCESSBENCH MATH
Loading...
6.1
Error Rate
SPARE-Llama3-8B
4.424
15.737
27.05
38.363
Jun 18, 2025
Error Rate
Accuracy
F1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Error Rate
Accuracy
F1 Score
SPARE-Llama3-8B
# Train=40.5K
2025.06
6.1
91.6
11.4
SPARE-Qwen2.5-3B
# Train=40.5K
2025.06
16
89.2
27.1
Math-Shepherd-7B
# Train=440K
2025.06
18
82
29.5
RLHFlow-Deepseek-8B
# Train=250K
2025.06
21.4
80
33.8
Skywork-7B
2025.06
43.8
62.2
53.6
Qwen-2.5-Math-7B-PRM800K (Human)
# Train=250K
2025.06
48
90.1
62.6
Feedback
Search any
task
Search any
task