| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| General LLM Capability | General Capability Suite MMLU, AlpacaEval, GSM8K, MATH, HumanEval | MMLU71.86 | 56 | |
| General Capability Evaluation | General Capability Suite MMLU, GSM8K, HumanEval, IFEval | Common Average Score77.78 | 39 | |
| General Capability Evaluation | General Capability Suite ARC-C, HellaSwag, MMLU, GSM8K | ARC-C Accuracy54.27 | 27 | |
| General Knowledge Preservation | General Capability Suite HS WG IFEval MMLU | HS Delta17.7 | 22 | |
| General Language Capability Evaluation | General Capability Suite Aggregate | General Capability Avg. Accuracy62.51 | 18 | |
| Language Understanding and Reasoning | General Capability Suite (MMLU, TruthfulQA, HellaSwag, ARC-Easy) (test) | MMLU Score0.082 | 16 | |
| General Capability Evaluation | General Capability Suite | Average Score71 | 12 | |
| General Language Capability | General Capability Suite (MMLU, GSM8K, GPQA) | MMLU Accuracy73.6 | 5 |