Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Massive Multitask Language Understanding on MMLU (Sub-category Performance)
Loading...
82.7
STEM Accuracy
LeanQuant
76.044
77.772
79.5
81.228
May 6, 2026
STEM Accuracy
Humanities Accuracy
Social Sciences Accuracy
Other Categories Accuracy
Average Accuracy
Updated 27d ago
Evaluation Results
Method
Method
Links
STEM Accuracy
Humanities Accuracy
Social Sciences Accuracy
Other Categories Accuracy
Average Accuracy
LeanQuant
Model=Llama-3.1-405B-I...
2026.05
82.7
83.2
90.6
87.7
86.1
OSAQ+GPTQ
Model=Llama-3.1-405B-I...
2026.05
82.6
83.2
90.8
87.7
86.1
GPTQ
Model=Llama-3.1-405B-I...
2026.05
82.3
82.6
90.5
87.5
85.7
OSAQ+GPTQ
Model=Mistral-Large-12...
2026.05
76.7
77.4
89.3
85.7
82.3
LeanQuant
Model=Mistral-Large-12...
2026.05
76.6
77.3
89.2
85.9
82.3
GPTQ
Model=Mistral-Large-12...
2026.05
76.3
77.2
89.3
85.2
82
Feedback
Search any
task
Search any
task