| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Chinese Multitask Language Understanding | CMMLU | Accuracy81.8 | 50 | |
| Multitask Language Understanding | CMMLU (test) | Accuracy78.3 | 38 | |
| Language Understanding | CMMLU | Accuracy90.1 | 27 | |
| Multi-task Language Understanding | CMMLU | Accuracy89.28 | 22 | |
| Examination | CMMLU | Score61.3 | 20 | |
| Chinese Language Knowledge and Reasoning | CMMLU | Score77.01 | 14 | |
| General Language Understanding | CMMLU | Overall Accuracy77.3 | 14 | |
| Comprehensive Examination | CMMLU (test) | Accuracy68.1 | 14 | |
| Chinese Language Understanding | CMMLU (test) | CMMLU Score0.574 | 13 | |
| Chinese Language Understanding | CMMLU | Score90.9 | 10 | |
| General Knowledge | CMMLU | Accuracy88.4 | 9 | |
| Comprehensive cognitive reasoning | CMMLU | Score53.45 | 8 | |
| Knowledge | CMMLU | Knowledge Score84.72 | 6 | |
| Medical Knowledge Evaluation | CMMLU Med | Accuracy86.89 | 5 | |
| Chinese General Knowledge | CMMLU | Accuracy90.9 | 4 | |
| Knowledge & Reasoning | CMMLU | Accuracy63.4 | 4 | |
| General Domains | CMMLU | Accuracy0.865 | 4 | |
| General Language Understanding | CMMLU 5-shot | Accuracy90.61 | 3 | |
| Language Understanding | CMMLU Cantonese | Accuracy (Humanities)27.72 | 3 | |
| Downstream Performance Prediction | CMMLU | MSE0.0033 | 3 | |
| Multilingual Understanding | CMMLU | Score72 | 2 | |
| Multilingual Knowledge | CMMLU | CMMLU Score71.8 | 2 | |
| Chinese Massive Multitask Language Understanding | CMMLU | CMMLU Score57.4 | 2 |