Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AMC12

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningAMC12
Expected Calibration Error (ECE)0.0563
22
Answer VerificationAMC12
AUROC91.05
22
Mathematical Problem SolvingAMC12
Best-of-N Accuracy58.66
17
Mathematical ReasoningAMC12
ECE10.64
17
Mathematical ReasoningAMC12
Accuracy74.69
12
Mathematical ReasoningAMC12
Mean@454.44
11
Mathematical ReasoningAMC12
Pass@166.27
10
Mathematical ReasoningAMC12 (test)
Accuracy46
8
Mathematical ReasoningAMC12
Best-of-64 Accuracy58.66
3
Showing 9 of 9 rows