Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on BBH, GPQA, and MuSR Suite
Loading...
83.4
BBH
Qwen3-14B
42.944
53.447
63.95
74.453
Jan 14, 2026
BBH
GPQA
MuSR
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
BBH
GPQA
MuSR
Average Score
Qwen3-14B
Parameters=14B
2026.01
83.4
49.8
57.7
63.6
Qwen3-4B
Parameters=4B
2026.01
79
39.8
58.5
59.1
Mi:dm 2.0 Base-inst
Type=Instruction-tuned
2026.01
77.7
33.5
51.9
54.4
Llama-3.1-8B-inst
Parameters=8B, Type=In...
2026.01
60.3
21.6
50.3
44.1
Exaone-3.5-7.8B-inst
Parameters=7.8B, Type=...
2026.01
50.1
33.1
51.2
44.8
Exaone-3.5-2.4B-inst
Parameters=2.4B, Type=...
2026.01
46.4
28.1
49.7
41.4
Mi:dm 2.0 Mini-inst
Type=Instruction-tuned
2026.01
44.5
26.6
51.7
40.9
Feedback
Search any
task
Search any
task