Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning on BIG-Bench Hard and GSM8K
Loading...
45.2
BBH Score
Qwen3.5 2B
28.456
32.803
37.15
41.497
May 26, 2026
BBH Score
GSM8K Score
Updated 7d ago
Evaluation Results
Method
Method
Links
BBH Score
GSM8K Score
Qwen3.5 2B
Nact/Ntotal=1.9B, Mode...
2026.05
45.2
61.3
MobileMoE-L
Nact/Ntotal=922M/5.3B,...
2026.05
40.1
77.6
MobileMoE-M
Nact/Ntotal=528M/2.8B,...
2026.05
39
67.5
Qwen3.5 0.8B
Nact/Ntotal=749M, Mode...
2026.05
37.8
45.7
OLMoE-1B-7B
Nact/Ntotal=1.3B/6.9B,...
2026.05
37.1
49.1
Gemma 3 1B
Nact/Ntotal=1.0B, Mode...
2026.05
35.8
38.9
SmolLM2 1.7B
Nact/Ntotal=1.7B, Mode...
2026.05
35.3
46.1
OLMo 2 1B
Nact/Ntotal=1.5B, Mode...
2026.05
35
46.9
Llama 3.2 1B
Nact/Ntotal=1.2B, Mode...
2026.05
33.9
46
MobileMoE-S
Nact/Ntotal=272M/1.3B,...
2026.05
32.2
52.2
Gemma 3 270M
Nact/Ntotal=270M, Mode...
2026.05
31.8
5.8
SmolLM2 360M
Nact/Ntotal=362M, Mode...
2026.05
30.5
10
MobileLLM-Pro
Nact/Ntotal=1.1B, Mode...
2026.05
29.1
31.8
Feedback
Search any
task
Search any
task