| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Aggregated Multimodal Evaluation | Multimodal Evaluation Suite Average | Average Relative Performance100 | 21 | |
| Multimodal Understanding | Multimodal Evaluation Suite (GQA, MMBench, MMBench-CN, MME, POPE, SEED-Bench, TextVQA, VizWiz, OCRBench) | GQA Score61.5 | 21 | |
| Statistical Significance Analysis | Multimodal Evaluation Suite (Tiny-ImageNet, CIFAR-100, FMNIST, Caltech-256, AG News, MMLU, VQA, CommonGen) (test) | Significance Rate (TAP Better)100 | 18 | |
| Multimodal Understanding | Multimodal Evaluation Suite MMB, MME, SQA, VQA^T, MMB^C, MMVet, MMstar, AI2D | MMB Score64.6 | 17 | |
| Multimodal Understanding and Reasoning | Multimodal Evaluation Suite (MMVet, MMBench_EN, SEED-Bench, LLaVABench, POPE, MME-P, MMVP, MMStar) (Random Sampling Splits of CC12M) | MMVet Score30.1 | 13 | |
| Multimodal Understanding | Multimodal Evaluation Suite Table 4 | ALL AVG Score73.6 | 9 | |
| Multimodal Understanding | Multimodal Evaluation Suite GQA, SQA-I, VQA-T, MME, VQAv2, MMB | GQA Score59.7 | 7 | |
| Comprehensive Multimodal Evaluation | Multimodal Evaluation Suite Composite | Overall Score68.7 | 5 |