Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Understanding on PiQA, ARC, HellaSwag, WinoGrande, MMLU
Loading...
75.2
Aggregate Accuracy
LLaMA-3-8B-Lizard
67.504
69.502
71.5
73.498
Jul 11, 2025
Aggregate Accuracy
Aggregate Accuracy (w/o MMLU)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Aggregate Accuracy
Aggregate Accuracy (w/o MMLU)
LLaMA-3-8B-Lizard
Training Tokens (B)=0....
2025.07
75.2
73.5
Mamba2-LLaMA-3
Training Tokens (B)=20...
2025.07
73.9
71
LLaMA-3-8B
Training Tokens (B)=15...
2025.07
73.1
72
Zamba-7B
Training Tokens (B)=10...
2025.07
71.8
69.5
StripedHyena-Nous-7B
Training Tokens (B)=–,...
2025.07
67.8
60.8
Feedback
Search any
task
Search any
task