Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

gsm

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningGSM-Hard
Accuracy99
169
Mathematical ReasoningGSM-Hard
Solve Rate78
162
ReasoningGSM PRO
Accuracy100
72
Math ReasoningGSM Hard
Accuracy82.6
67
Mathematical ReasoningGSM-Hard
Accuracy89.52
46
ReasoningGSM→FOL
Accuracy85.8
45
Mathematical ReasoningGSM
Accuracy94
45
Mathematical ReasoningGSM (test)
Accuracy65.4
42
Mathematical ReasoningGSM Hard
Accuracy68.6
28
Mathematical ReasoningGSM-Hard
GSM-Hard pass@1 Acc69.6
27
Mathematical ReasoningGSM
Accuracy61
27
Mathematical Reasoning (Calculator)GSM-PLUS
Accuracy76.54
25
MathGSM-Plus
Score89.74
22
Mathematical ReasoningGSM-ICM
Accuracy92.7
16
Math ReasoningGSM-H (held-out)
Accuracy (%)57.54
14
Mathematical ReasoningGSM 8K
pass@K97.77
12
Multi-objective reinforcement learningRLVR-GSM
Multiplicative Gap (ε)0.0112
12
Grade-school reasoningGSM Hard
Pass@1 Success Rate53.4
9
Correctness verificationGSM-Symbolic
LB0.435
8
Math ReasoningGSM DE
Accuracy66
7
Math ReasoningGSM CoT
Accuracy (GSM CoT)83.2
7
Mathematical ReasoningGSM
GSM Accuracy92.16
7
Arithmetic ReasoningGSM Reversed
Accuracy90.3
7
Mathematical ReasoningGSM-SYS
Accuracy80.9
7
Compiler phase orderinggsm
Execution Cycles6,178
7
Showing 25 of 30 rows