| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Large Language Model Evaluation | MMLU, GSM8k, HellaSwag, WinoGrande | Average Score78.9 | 58 | |
| Language Modeling Evaluation | MMLU, GSM8k, HellaSwag, WinoGrande | MMLU Accuracy72.98 | 17 | |
| Natural Language Understanding and Mathematical Reasoning | MMLU, GSM8k, HellaSwag, WinoGrande (test) | MMLU Accuracy77.18 | 13 | |
| Large Language Model Evaluation | MMLU, GSM8k, HellaSwag, WinoGrande (test) | MMLU Accuracy86.55 | 13 |