Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multitask Language Understanding on MMLU (Accuracy and Performance Gain)
Loading...
73.5
Accuracy
Qwen3 8B Base
39.804
48.552
57.3
66.048
Jan 30, 2026
Accuracy
Performance Gain
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Performance Gain
Qwen3 8B Base
Classifier=Self-labeled
2026.01
73.5
-2.5
Qwen3 8B Base
Classifier=Majority-la...
2026.01
73.5
-2.5
Llama 3.1 8B
Classifier=Self-labeled
2026.01
63.9
-0.8
Llama 3.1 8B
Classifier=Majority-la...
2026.01
63.6
-1.1
Mistral Nemo Base 2407
Classifier=Self-labeled
2026.01
62.6
-1.3
Mistral Nemo Base 2407
Classifier=Majority-la...
2026.01
62.3
-1.6
Mistral 7B v0.3
Classifier=Self-labeled
2026.01
58.7
-1.3
Mistral 7B v0.3
Classifier=Majority-la...
2026.01
58.5
-1.5
Llama 3.2 1B
Classifier=Self-labeled
2026.01
43.8
1.5
Qwen3 0.6B
Classifier=Self-labeled
2026.01
42
-1
Qwen3 0.6B
Classifier=Majority-la...
2026.01
41.4
-1.6
Llama 3.2 1B
Classifier=Majority-la...
2026.01
41.1
-1.1
Feedback
Search any
task
Search any
task