Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

gsm

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningGSM-Hard
Accuracy99
169
Mathematical ReasoningGSM-Hard
Solve Rate78
162
Mathematical ReasoningGSM
Accuracy92.5
35
Math ReasoningGSM Hard
Accuracy66.9
31
Mathematical ReasoningGSM-Hard
GSM-Hard pass@1 Acc69.6
27
Mathematical ReasoningGSM
Accuracy61
27
Mathematical Reasoning (Calculator)GSM-PLUS
Accuracy76.54
25
Mathematical ReasoningGSM-ICM
Accuracy92.7
16
Math ReasoningGSM-H (held-out)
Accuracy (%)57.54
14
MathGSM-Plus
Score89.74
10
Grade-school reasoningGSM Hard
Pass@1 Success Rate53.4
9
Correctness verificationGSM-Symbolic
LB0.435
8
Mathematical ReasoningGSM
GSM Accuracy92.16
7
Arithmetic ReasoningGSM Reversed
Accuracy90.3
7
Mathematical ReasoningGSM-SYS
Accuracy80.9
7
Compiler phase orderinggsm
Execution Cycles6,178
7
Math ReasoningGSM MC
FPR3.75
5
Mathematical ReasoningGSM Hard
Accuracy24.6
5
Arithmetic ReasoningGSM
Accuracy72.7
4
Mathematical ReasoningGSM
Win Rate48.9
1
Failure FlippingGSMPlus
Trial Success Rate0.49
1
Showing 21 of 21 rows