Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (EM strict, EM flex)
Loading...
18.14
EM (strict)
ReMoE
16.4656
16.9003
17.335
17.7697
May 26, 2026
EM (strict)
EM (flex)
Updated 7d ago
Evaluation Results
Method
Method
Links
EM (strict)
EM (flex)
ReMoE
Backbone=Qwen1.5-MoE-A...
2026.05
18.14
61.11
Baseline
Backbone=Qwen1.5-MoE-A...
2026.05
16.53
60.58
Feedback
Search any
task
Search any
task