| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Understanding | MLLM Evaluation Suite (HallusionBench, MME, AI2D, RWQA, SQA, POPE, MMBench, CCB, VSR, V7W) (test) | HallusionBench Score62.93 | 32 | |
| Multimodal Large Language Model Evaluation | MLLM Evaluation Suite | Average Score (All)56.7 | 22 | |
| Multimodal Understanding | MLLM Evaluation Suite GQA MME POPE SQA VQAtext VizWiz MMBen AI2D v1.5 v1.6 Qwen2.5 (test) | GQA Score64.8 | 12 | |
| Multimodal Question Answering | MLLM Evaluation Suite (GQA, MMB, MMB-CN, MME, POPE, SQA, VQAv2, VQA-Text, VizWiz) (test) | GQA Accuracy64.27 | 11 | |
| Multimodal Question Answering | MLLM Evaluation Suite (HallBench, MME, TextVQA, ChartQA, AI2D, RealWorldQA, CCBench, OCRVQA, SQA-IMG, POPE) (test) | HallBench49.8 | 7 | |
| Multimodal Understanding | MLLM Evaluation Suite (GQA, MMBench, MME, POPE, ScienceQA, VQA v2, HRBench-8k, XLRS) | GQA Score60.9 | 7 | |
| Multimodal Understanding | MLLM Evaluation Suite (GQA, MMBench, MME, POPE, ScienceQA, VQAv2, MMMU, SEED-I) LLaVA-NeXT (test) | GQA Accuracy64.2 | 7 | |
| Multimodal Understanding and Reasoning | MLLM Evaluation Suite (MME, MMB, VizWiz, POPE, GQA, RQA, VQAT, SQA) standard (test val) | MME Score2,375 | 5 | |
| Multimodal Understanding | MLLM Evaluation Suite (GQA, MMB, MME, POPE, SQA, VQAv2, VQAText, MMMU, SEED-I, VizWiz) | GQA Score65.4 | 4 | |
| Multimodal Large Language Model Evaluation | MLLM Evaluation Suite (MME, MMStar, SQA, RealWorldQA, MMMU, MMMU-P, VisuLogic, LogicVista, CRPE, POPE, HallBench) (test) | MME74.94 | 4 |