Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-task Language Understanding on MMLU-IT
Loading...
81.5
Accuracy
Qwen3-30B-A3B
47.908
56.629
65.35
74.071
Mar 17, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-30B-A3B
Group=Larger, Configur...
2026.03
81.5
Qwen3-30B-A3B
Group=Larger, Configur...
2026.03
81.5
Gpt-oss-20b-high
Group=Larger, Configur...
2026.03
80.4
Gpt-oss-20b-high
Group=Larger, Configur...
2026.03
80.2
Qwen3-14B
Group=Larger, Configur...
2026.03
77.6
Qwen3-14B
Group=Larger, Configur...
2026.03
77.6
gemma-3-12b-it
Group=Larger, Configur...
2026.03
69
gemma-3-12b-it
Group=Larger, Configur...
2026.03
68.5
gemma-2-9b-it
Group=Comparable, Conf...
2026.03
67.2
gemma-2-9b-it
Group=Comparable, Conf...
2026.03
66.5
EngGPT2-16B-A3B
Group=Comparable, Conf...
2026.03
65.5
EngGPT2-16B-A3B
Group=Comparable, Conf...
2026.03
65.5
Llama-3.1-8B-Instruct
Group=Comparable, Conf...
2026.03
60.6
Llama-3.1-8B-Instruct
Group=Comparable, Conf...
2026.03
60.6
Moonlight-16B-A3B-Instruct
Group=Comparable, Conf...
2026.03
49.2
Moonlight-16B-A3B-Instruct
Group=Comparable, Conf...
2026.03
49.2
Feedback
Search any
task
Search any
task