Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Math Reasoning Tasks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Membership AuditingMath Reasoning Tasks
AUC83.5
56
Math reasoningMath Reasoning Tasks (MultiArith, GSM8K, AddSub, AQUA, SingleEq, SVAMP, MAWPS) (test)
MultiArith99.7
23
Math ReasoningMath Reasoning Tasks Group 1
MATH500 Score61.8
11
Math ReasoningMath Reasoning Tasks AIME, MATH500, GSM8K, AMC aggregated (test)
Pass@k Score64.04
9
Mathematical ReasoningMath reasoning tasks
Math Avg. Pass@4 (Before)67.43
4
Showing 5 of 5 rows