Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (Accuracy by Sample Count)
Loading...
70
Accuracy (1 sample)
DEL
66.464
67.382
68.3
69.218
May 19, 2026
Accuracy (1 sample)
Accuracy (10^1)
Accuracy (10^2)
Accuracy (10^3)
Accuracy (10^4)
Task Accuracy
Updated 13d ago
Evaluation Results
Method
Method
Links
Accuracy (1 sample)
Accuracy (10^1)
Accuracy (10^2)
Accuracy (10^3)
Accuracy (10^4)
Task Accuracy
DEL
Backbone=Qwen2.5-1.5B
2026.05
70
67
67.2
64.9
61.7
57
EMO
Backbone=Qwen2.5-1.5B
2026.05
69.3
64.1
66.3
66.4
59.6
54.5
NTL
Backbone=Qwen2.5-1.5B
2026.05
68.9
65.1
64.6
64.9
61.7
55.1
DIST2Loss
Backbone=Qwen2.5-1.5B
2026.05
67.9
63.5
64.6
61.1
53.2
54.2
MLE
Backbone=Qwen2.5-1.5B,...
2026.05
66.8
62.4
63.7
66.4
61.7
54
MixCE
Backbone=Qwen2.5-1.5B
2026.05
66.6
62.5
65.6
65.6
57.4
52.8
Feedback
Search any
task
Search any
task