Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on BIG-Bench Extra Hard
Loading...
37.8
Score
Qwen3-30B-A3B-Inst-2507
13.8904
20.0977
26.305
32.5123
Feb 9, 2026
Score
TPF
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
TPF
Qwen3-30B-A3B-Inst-2507
2026.02
37.8
1
LLaDA2.1-flash
Inference Mode=Q Mode
2026.02
35.77
3.17
LLaDA2.1-flash
Inference Mode=S Mode
2026.02
33.51
5.04
LLaDA2.0-flash
2026.02
27.86
4.6
Ling-flash-2.0
2026.02
23.24
1
Qwen3-8B
no think=true
2026.02
18.27
-
LLaDA2.0-mini
2026.02
16.47
2.03
LLaDA2.1-mini
mode=Q Mode
2026.02
15.78
1.66
LLaDA2.1-mini
mode=S Mode
2026.02
15.3
3.19
Ling-mini-2.0
2026.02
14.81
-
Feedback
Search any
task
Search any
task