Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (EM Strict/Flex)
Loading...
38.89
EM (Strict)
Baseline
36.8412
37.3731
37.905
38.4369
May 26, 2026
EM (Strict)
EM (Flex)
Updated 7d ago
Evaluation Results
Method
Method
Links
EM (Strict)
EM (Flex)
Baseline
Evaluation Framework=l...
2026.05
38.89
39.04
ReMoE
Evaluation Framework=l...
2026.05
38.13
38.36
CE-only
Evaluation Framework=l...
2026.05
36.92
37.23
Feedback
Search any
task
Search any
task