Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Process-level Reward Modeling on PROCESSBENCH Omni-MATH
Loading...
2.8
Error Rate
SPARE-Llama3-8B
1.72
9.01
16.3
23.59
Jun 18, 2025
Error Rate
Accuracy
F1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Error Rate
Accuracy
F1 Score
SPARE-Llama3-8B
# Train=40.5K
2025.06
2.8
82.2
5.4
RLHFlow-Deepseek-8B
# Train=250K
2025.06
10.9
51.9
16.9
Skywork-7B
2025.06
14
41.9
21
SPARE-Qwen2.5-3B
# Train=40.5K
2025.06
14
83.8
23.9
Math-Shepherd-7B
# Train=440K
2025.06
14.2
73
23.8
Qwen-2.5-Math-7B-PRM800K (Human)
# Train=250K
2025.06
29.8
86.1
44.3
Feedback
Search any
task
Search any
task