Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PROCESSBENCH

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical Reasoning Process EvaluationPROCESSBENCH
GSM8K Accuracy82.9
28
ReasoningProcessBench
Accuracy69.85
20
Process VerificationProcessBench Without Standard Answers
Precise Accuracy71.9
18
Process VerificationProcessBench With Standard Answers
Precise Accuracy78.9
18
Process-level verificationProcessBench Aggregate (test)
Avg F156.5
13
Step-level Correctness DiscriminationProcessBench GSM8K MATH Olympiad Bench Omni Math
GSM8K Error Rate0.242
12
Process Reward Model AssessmentPROCESSBENCH
GSM8K Accuracy70.8
11
Showing 7 of 7 rows