| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Large Language Model Evaluation | MLLM Evaluation Suite | Average Score (All)56.7 | 22 | |
| Multimodal Question Answering | MLLM Evaluation Suite (GQA, MMB, MMB-CN, MME, POPE, SQA, VQAv2, VQA-Text, VizWiz) (test) | GQA Accuracy64.27 | 11 | |
| Multimodal Question Answering | MLLM Evaluation Suite (HallBench, MME, TextVQA, ChartQA, AI2D, RealWorldQA, CCBench, OCRVQA, SQA-IMG, POPE) (test) | HallBench49.8 | 7 | |
| Multimodal Understanding | MLLM Evaluation Suite (GQA, MMBench, MME, POPE, ScienceQA, VQA v2, HRBench-8k, XLRS) | GQA Score60.9 | 7 | |
| Multimodal Understanding | MLLM Evaluation Suite (GQA, MMBench, MME, POPE, ScienceQA, VQAv2, MMMU, SEED-I) LLaVA-NeXT (test) | GQA Accuracy64.2 | 7 | |
| Multimodal Understanding | MLLM Evaluation Suite (GQA, MMB, MME, POPE, SQA, VQAv2, VQAText, MMMU, SEED-I, VizWiz) | GQA Score65.4 | 4 | |
| Multimodal Large Language Model Evaluation | MLLM Evaluation Suite (MME, MMStar, SQA, RealWorldQA, MMMU, MMMU-P, VisuLogic, LogicVista, CRPE, POPE, HallBench) (test) | MME74.94 | 4 |