Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PRM800K

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningPRM800K (test)
Accuracy80
15
Math ReasoningPRM800K
AUC-ROC0.613
5
Instance-level EvaluationPRM800K
AUC-ROC0.42
1
Showing 3 of 3 rows