| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Chinese Language Understanding | C-Eval | Accuracy92.5 | 47 | |
| General Knowledge Assessment | C-Eval | Accuracy92.5 | 37 | |
| Multi-level multi-discipline evaluation | C-Eval | Accuracy81.4 | 28 | |
| Language Understanding | C-Eval | C-Eval Score87.7 | 24 | |
| General Language Understanding | C-Eval (val) | Accuracy78.68 | 18 | |
| Chinese Language Knowledge and Reasoning | C-Eval | Overall Score78.5 | 14 | |
| Comprehensive Examination | C-Eval (test) | Accuracy71.5 | 14 | |
| General Knowledge Evaluation | C-Eval (test) | Accuracy71.8 | 13 | |
| Knowledge | C-EVAL | Score88.12 | 12 | |
| Chinese Language Evaluation | C-Eval (val) | C-Eval 0-shot Score83 | 12 | |
| General Knowledge | C-Eval 1.0 (val) | Accuracy78.68 | 12 | |
| Chinese General Knowledge Question Answering | C-Eval | Accuracy91.82 | 10 | |
| General Knowledge Evaluation | C-Eval (val) | Accuracy34.32 | 8 | |
| Chinese Language Understanding | C-Eval (test) | Accuracy86 | 7 | |
| Chinese Language Understanding | C-Eval | Exact Match91.8 | 6 | |
| Language Understanding | C-Eval | Exact Match92.5 | 4 | |
| Knowledge | C-Eval | C-Eval Knowledge Accuracy0.589 | 4 | |
| General Language Understanding | C-Eval 5-shot | Accuracy0.9249 | 3 | |
| Downstream Performance Prediction | C-Eval | MSE0.0037 | 3 | |
| Comprehensive Chinese Evaluation | C-Eval | Accuracy69 | 2 | |
| Comprehensive Chinese Transformer Evaluation | C-Eval | C-Eval Score55.5 | 1 |