Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MATH and GSM8k

Benchmarks

Task NameDataset NameSOTA ResultTrend
Step-level error discriminationMATH and GSM8k (test)
AUROC (Step-level Error Discrimination)0.762
4
Showing 1 of 1 rows