Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Numerical Reasoning on GSM8K (test) (Error Magnitude Accuracy)
Loading...
70
Accuracy (Error <= 1)
DEL
66.464
67.382
68.3
69.218
May 19, 2026
Accuracy (Error <= 1)
Accuracy (Error <= 10)
Accuracy (Error <= 100)
Accuracy (Error <= 1000)
Accuracy (Error <= 10000)
Task Accuracy
Updated 13d ago
Evaluation Results
Method
Method
Links
Accuracy (Error <= 1)
Accuracy (Error <= 10)
Accuracy (Error <= 100)
Accuracy (Error <= 1000)
Accuracy (Error <= 10000)
Task Accuracy
DEL
Backbone=Qwen2.5-1.5B,...
2026.05
70
67
67.2
64.9
61.7
57
EMO
Backbone=Qwen2.5-1.5B,...
2026.05
69.3
64.1
66.3
66.4
59.6
54.5
NTL
Backbone=Qwen2.5-1.5B,...
2026.05
68.9
65.1
64.6
64.9
61.7
55.1
DIST2Loss
Backbone=Qwen2.5-1.5B,...
2026.05
67.9
63.5
64.6
61.1
53.2
54.2
MLE
Backbone=Qwen2.5-1.5B,...
2026.05
66.8
62.4
63.7
66.4
61.7
54
MixCE
Backbone=Qwen2.5-1.5B,...
2026.05
66.6
62.5
65.6
65.6
57.4
52.8
Feedback
Search any
task
Search any
task