Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GSM8K and MATH

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningGSM8K and MATH
GSM8K Score78.4
27
Mathematical ReasoningGSM8K and MATH500 Aggregate
Avg Accuracy86.02
9
Arithmetic ReasoningGSM8K and MATH Average
Average Accuracy43.4
7
Showing 3 of 3 rows