| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| General Multimodal Understanding | General Benchmarks | Average Score74 | 12 | |
| General Language Modeling | General Benchmarks Llama 3.1 8B | Generation Quality Score66.5 | 11 | |
| Natural Language Understanding and Reasoning | General Benchmarks Italian | ARC-C-it37.47 | 6 | |
| General Language Understanding | General Benchmarks (MMLU, AlpacaEval, Arena-Hard) | MMLU Accuracy73.41 | 4 |