Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Process-level verification on ProcessBench Aggregate (test)
Loading...
56.5
Avg F1
Qwen2.5-Math-7B-PRM800K
25.404
33.477
41.55
49.623
May 20, 2025
Avg F1
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg F1
Qwen2.5-Math-7B-PRM800K
training_data_source=h...
2025.05
56.5
SCOPE
AST Normalization=true...
2025.05
54.4
SCOPE (w/o AST Normalization)
AST Normalization=fals...
2025.05
53.6
SCOPE (w/o Code Translation)
AST Normalization=true...
2025.05
50.6
SCOPE (Step Replacement)
Step Compression Strat...
2025.05
44.6
SCOPE (Step Skipping)
Step Compression Strat...
2025.05
44.6
Skywork-PRM-7B
training_data_source=a...
2025.05
42.1
Skywork-PRM-1.5B
training_data_source=a...
2025.05
36.4
Math-Shepherd-PRM-7B
training_data_source=a...
2025.05
31.5
EurusPRM-Stage2
training_data_source=a...
2025.05
31.3
EurusPRM-Stage1
training_data_source=a...
2025.05
31.2
RLHFlow-PRM-Mistral-8B
training_data_source=a...
2025.05
28.4
RLHFlow-PRM-Deepseek-8B
training_data_source=a...
2025.05
26.6
Feedback
Search any
task
Search any
task