| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-task Language Understanding and Reasoning | OpenCompass SIQA, GSM8K, WiC, HumanEval, MMLU, CSQA | SIQA66.79 | 30 | |
| Multimodal Evaluation Collection | OpenCompass | OpenCompass Score65.1 | 19 | |
| Reasoning | OpenCompass (test) | CMMLU69.58 | 11 | |
| Large Language Model Evaluation | OpenCompass | cMMLU84.88 | 11 | |
| Multimodal Evaluation | Opencompass | Average Score69.1 | 10 | |
| Large Model Performance Prediction | OpenCompass 95% masking September 30, 2024 cutoff (temporal split) | RMSE8.75 | 10 | |
| Visual Question Answering | OpenCompass | MMBench82.2 | 6 | |
| Multimodal Understanding | OpenCompass | Average Score67 | 5 |