Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (Parameter & Training Efficiency)
Loading...
509.7
Parameters (Full) (M)
Hot-25% (MoE-Sieve)
303.572
357.086
410.6
464.114
Mar 25, 2026
Parameters (Full) (M)
Parameters (Hot) (M)
Parameter Reduction
Checkpoint Size (Full) (GB)
Checkpoint Size (Hot) (MB)
Checkpoint Size Reduction
Training Time (Full) (min)
Training Time (Hot) (min)
Training Time Reduction
Updated 2mo ago
Evaluation Results
Method
Method
Links
Parameters (Full) (M)
Parameters (Hot) (M)
Parameter Reduction
Checkpoint Size (Full) (GB)
Checkpoint Size (Hot) (MB)
Checkpoint Size Reduction
Training Time (Full) (min)
Training Time (Hot) (min)
Training Time Reduction
Hot-25% (MoE-Sieve)
Backbone Model=Qwen
2026.03
509.7
151.3
70.3
2.04
606
71
3
1
49
Hot-25% (MoE-Sieve)
Backbone Model=OLMoE
2026.03
311.5
85
72.7
1.25
340
73.4
1
54
50
Feedback
Search any
task
Search any
task