| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Comprehensive Examination | AGIEval (test) | Accuracy62.3 | 34 | |
| General Reasoning | AGIEval | Exact Match70.4 | 33 | |
| Natural Language Understanding | AGIEval | Accuracy71.6 | 24 | |
| Question Answering | AGIEval | Vanilla Accuracy43.92 | 14 | |
| Mathematical Reasoning | AGIEval MATH | Accuracy95.7 | 12 | |
| Question Answering | AGIEval | Accuracy32.11 | 12 | |
| Mathematical Reasoning | AGIEval-MATH (test) | Accuracy52.1 | 11 | |
| General Evaluation | AGIEval | Accuracy70.22 | 8 | |
| General Intelligence Evaluation | AGIEval (test) | AGIEval (3-shot)27 | 8 | |
| Question Answering | AGIEval (test) | AQUA-RAT28.3 | 5 | |
| General Intelligence Evaluation | AGIEval G | Accuracy72 | 4 | |
| General Reasoning | agieval | Accuracy46.03 | 4 | |
| General Language Understanding | AGIEval 5-shot | Accuracy80.22 | 3 |