Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Numerical Reasoning on GSM8K (test)
Loading...
1.13
MAE (Scale 1)
EMO
1.1248
1.1599
1.195
1.2301
May 19, 2026
MAE (Scale 1)
MAE (Scale 10^1)
MAE (Scale 10^2)
MAE (Scale 10^3)
MAE (Scale 10^4)
Global MAE
Updated 13d ago
Evaluation Results
Method
Method
Links
MAE (Scale 1)
MAE (Scale 10^1)
MAE (Scale 10^2)
MAE (Scale 10^3)
MAE (Scale 10^4)
Global MAE
EMO
Backbone=Qwen2.5-1.5B,...
2026.05
1.13
1.08
0.85
0.98
1.72
0.71
NTL
Backbone=Qwen2.5-1.5B,...
2026.05
1.16
1.06
0.84
1.05
1.53
0.78
DEL
Backbone=Qwen2.5-1.5B,...
2026.05
1.17
1.03
0.81
0.91
1.17
0.63
DIST2Loss
Backbone=Qwen2.5-1.5B,...
2026.05
1.18
1.14
0.93
1.11
1.62
0.77
MixCE
Backbone=Qwen2.5-1.5B,...
2026.05
1.25
1.18
0.91
1.08
1.72
0.79
MLE
Backbone=Qwen2.5-1.5B,...
2026.05
1.26
1.2
0.91
1.11
1.57
0.72
Feedback
Search any
task
Search any
task