Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Process-level Reward Modeling on PROCESSBENCH Olymp.Bench
Loading...
3.3
Error
SPARE-Llama3-8B
2.004
10.752
19.5
28.248
Jun 18, 2025
Error
Correctness
F1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Error
Correctness
F1 Score
SPARE-Llama3-8B
# Train=40.5K
2025.06
3.3
87.6
6.4
RLHFlow-Deepseek-8B
# Train=250K
2025.06
10.1
51
16.9
SPARE-Qwen2.5-3B
# Train=40.5K
2025.06
11.1
85
19.6
Math-Shepherd-7B
# Train=440K
2025.06
15
71.1
24.8
Skywork-7B
2025.06
17.9
31.9
22.9
Qwen-2.5-Math-7B-PRM800K (Human)
# Train=250K
2025.06
35.7
87.3
50.7
Feedback
Search any
task
Search any
task