Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Llama Evaluation Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language UnderstandingLlama-3.1-70B Evaluation Suite MMLU, WinoGrande, HellaSwag, ARC-Easy, ARC-Challenge
MMLU78.58
7
Language Understanding and Code GenerationLlama 1B Evaluation Suite (ARC, HellaSwag, MMLU, TruthfulQA, WinoGrande, Humaneval) 3.2
ARC39.33
6
Showing 2 of 2 rows