Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-task Language Understanding on MMLU v1 (test)
Loading...
66.6
Accuracy
LLaMA-3-8B
31.968
40.959
49.95
58.941
Jul 11, 2025
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
LLaMA-3-8B
Training Tokens (B)=15...
2025.07
66.6
Gemma-7B
Training Tokens (B)=60...
2025.07
62.9
Mistral-7B
Training Tokens (B)=80...
2025.07
62.4
LLaMA-3-8B-Lizard
Training Tokens (B)=0....
2025.07
61.2
Mistral-7B-Lizard
Training Tokens (B)=0....
2025.07
60.8
LLaMA-3-8B-LoLCATs
Training Tokens (B)=0....
2025.07
52.8
Mistral-7B-LoLCATs
Training Tokens (B)=0....
2025.07
51.4
Liger-GLA-Llama-3-8B
Training Tokens (B)=0....
2025.07
43.4
Mamba2-LLaMA-3-8B
Training Tokens (B)=20...
2025.07
43.2
TransNormerLLM-7B
Training Tokens (B)=14...
2025.07
43.1
Griffin-7B
Training Tokens (B)=30...
2025.07
39.3
Liger-GLA-Mistral-7B
Training Tokens (B)=0....
2025.07
36.3
Hawk-7B
Training Tokens (B)=30...
2025.07
35
Mistral-7B-SUPRA
Training Tokens (B)=10...
2025.07
34.2
Mamba-7B
Training Tokens (B)=12...
2025.07
33.3
Feedback
Search any
task
Search any
task