| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | AGIEval MATH | Accuracy95.7 | 99 | |
| Comprehensive Examination | AGIEval (test) | Accuracy62.3 | 37 | |
| General Reasoning | AGIEval | Exact Match70.4 | 33 | |
| Mathematical Reasoning | AGIEval-MATH (test) | Accuracy93.3 | 31 | |
| Natural Language Understanding | AGIEval | Accuracy71.6 | 30 | |
| General Evaluation | AGIEval | Accuracy70.22 | 29 | |
| Reasoning | AGIEval English | Score (%)74.4 | 21 | |
| Human-level Standardized Exam Evaluation | AGIEval | Score51.05 | 18 | |
| Out-of-Domain Generalization | AGIEval Out-of-Domain Law (test) | Average OOD Accuracy43.41 | 16 | |
| General Reasoning | AGIEval en | Speedup Ratio2.132 | 15 | |
| General Reasoning | agieval | Accuracy63.71 | 14 | |
| Question Answering | AGIEval | Vanilla Accuracy43.92 | 14 | |
| Mathematical Proficiency | AGIEval MATH Level-5 | Accuracy64.45 | 13 | |
| Question Answering | AGIEval | Accuracy32.11 | 12 | |
| Reasoning | AGIEval | AGIEval Reasoning Accuracy48.88 | 10 | |
| General Intelligence Evaluation | AGIEval (test) | AGIEval (3-shot)27 | 8 | |
| Question Answering | Agieval Cn | Accuracy36.58 | 7 | |
| Standardized exam solving | AGIEval | Accuracy30.91 | 6 | |
| Question Answering | AGIEval (test) | AQUA-RAT28.3 | 5 | |
| General Intelligence Evaluation | AGIEval G | Accuracy72 | 4 | |
| Standardized Exam Reasoning | AGIEval 5-shot | LSAT-RC (5-shot)27.2 | 3 | |
| Reasoning | AGIEval Cn | Normalized Accuracy36.44 | 3 | |
| General Knowledge | AGIEval En | CoT EM77.92 | 3 | |
| General Language Understanding | AGIEval 5-shot | Accuracy80.22 | 3 | |
| Question Answering | AGIEval en (test) | Accuracy18.3 | 2 |