Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

gsm

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningGSM-Hard
Accuracy99
169
Mathematical ReasoningGSM-Hard
Solve Rate78
162
Math ReasoningGSM Hard
Accuracy82.6
73
ReasoningGSM PRO
Accuracy100
72
Mathematical ReasoningGSM
Accuracy94
70
Mathematical ReasoningGSM-Hard
Accuracy89.52
46
ReasoningGSM→FOL
Accuracy85.8
45
Mathematical ReasoningGSM-sym
Exact Match90.22
44
Mathematical ReasoningGSM (test)
Accuracy65.4
42
Mathematical ReasoningGSM-Hard
GSM-Hard pass@1 Acc69.6
40
Mathematical ReasoningGSM Hard
Accuracy68.6
31
MathGSM-Plus
Score90.7
28
Mathematical ReasoningGSM
GSM Accuracy92.16
27
Mathematical ReasoningGSM
Accuracy61
27
Mathematical Reasoning (Calculator)GSM-PLUS
Accuracy76.54
25
Mathematical ReasoningGSM-ICM
Accuracy92.7
16
Mathematical ReasoningGSM Hard
Accuracy10.8
15
Math ReasoningGSM-H (held-out)
Accuracy (%)57.54
14
Mathematical ReasoningGSM
Pass@1 Accuracy82.18
13
Mathematical ReasoningGSM 8K
pass@K97.77
12
Multi-objective reinforcement learningRLVR-GSM
Multiplicative Gap (ε)0.0112
12
Mathematical ReasoningGSM8K
Accuracy97.8
9
Mathematical ReasoningGSM Hard
Accuracy52.69
9
Grade-school reasoningGSM Hard
Pass@1 Success Rate53.4
9
Correctness verificationGSM-Symbolic
LB0.435
8
Showing 25 of 39 rows