| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Scientific Reasoning | CEval Sci | Score66.19 | 20 | |
| Scientific Reasoning | CEval Hard | Overall Score56.58 | 19 | |
| General Knowledge | CEval | Score90.4 | 13 | |
| Multi-task Language Understanding | CEval | Accuracy44.7 | 13 | |
| Actuator Inversion | All Environments (Ceval-in) | AER0.57 | 8 | |
| Language Understanding | CEval | Accuracy63.03 | 8 | |
| Multiple-choice Question Answering | CEval | Accuracy79.86 | 7 | |
| Chinese Knowledge | CEval | Accuracy74.1 | 6 | |
| General Knowledge Evaluation | CEVAL | Accuracy85.52 | 5 | |
| Medical Knowledge Evaluation | CEVAL Med | Accuracy91.46 | 5 | |
| General Language Understanding | CEval | Accuracy73 | 4 | |
| General Domains | CEval | Accuracy90.91 | 4 | |
| Knowledge Understanding | CEval | Accuracy45 | 2 |