Our new X account is live! Follow @wizwand_team for updates
Search any
task
Feedback
Search any
task
SOTA Multitask Language Understanding benchmarks and papers with code | Wizwand
Our new X account is live! Follow @wizwand_team for updates
Home
/
Tasks
Multitask Language Understanding
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
MMLU (test)
Claude 3.5 Sonnet
Accuracy
92.16
303
3d ago
MMLU
Average human expert performance
Accuracy
89.8
206
2d ago
MMLU-Pro
GPT-5 (High)
Accuracy
87.1
99
2d ago
MMLU (val)
TAIA
Accuracy
63.16
58
3d ago
MMLU Exact split, o=3
W/O Decontamination
Accuracy
92.1
42
3d ago
CMMLU (test)
GPT 4o
Accuracy
78.3
38
3d ago
MMLU
Tulu 3-SFT
pass@1
71.9
24
3d ago
MMLU Semantic-level split, o=3
W/O Decontamination
Accuracy
90.1
21
3d ago
MMLU f_con o=5
W/O Decontamination
Accuracy (Exact)
99.5
18
3d ago
MMMLU Swahili 1.0 (test)
CLO
Accuracy
33.38
18
3d ago
MMMLU Korean 1.0 (test)
CLO
Accuracy
41.94
18
3d ago
BIG-bench-lite 24 tasks
PaLM 540B
Score
3,777
17
3d ago
MMLU-ProX non-EU languages (test)
Qwen-3-30B-A3B
Accuracy
70.9
16
3d ago
MMMLU non-EU languages (test)
Qwen-3-30B-A3B
Accuracy
77.4
16
3d ago
ArabicMMLU
GPT-4
Accuracy
72.5
16
3d ago
MMLU-pro
CortexDebate
RA
59.33
16
3d ago
MMLU-ProX 24 official EU languages
Qwen-3-30B-A3B
Score
73.1
14
3d ago
MMMLU 24 official EU languages
Qwen-3-30B-A3B
Overall Score
80.6
14
3d ago
MMLU
Qwen3 8B Base
Accuracy
73.5
12
3d ago
MMLU Hindi
Trinity Large (MoE)
Accuracy
67
12
3d ago
MMLU Korean (test)
Trinity Large
Accuracy
72
12
3d ago
MMLU
PROBELLM
Mean Accuracy (MA)
73
12
3d ago
Global MMLU-Lite
BYOL-nya
Accuracy
64.5
12
2d ago
MMLU
Trinity Large Preview
MMLU Score
87.21
11
3d ago
MMLU Bengali (test)
Trinity Large (MoE)
MMLU Score
62
11
3d ago
Showing 25 of 41 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Terms of Service
FAQs