| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Understanding | Llama-3.1-70B Evaluation Suite MMLU, WinoGrande, HellaSwag, ARC-Easy, ARC-Challenge | MMLU78.58 | 7 | |
| Language Understanding and Code Generation | Llama 1B Evaluation Suite (ARC, HellaSwag, MMLU, TruthfulQA, WinoGrande, Humaneval) 3.2 | ARC39.33 | 6 |