| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Understanding | MMStar | Accuracy82 | 407 | |
| Multimodal Reasoning | MMStar | Accuracy82 | 143 | |
| Multimodal Evaluation | MMStar | Accuracy69.5 | 139 | |
| Visual Question Answering | MMStar | Accuracy91.9 | 100 | |
| Multimodal Reasoning | MMStar | Accuracy77.1 | 78 | |
| General image understanding | MMStar | Accuracy72.13 | 58 | |
| Image Understanding | MMStar | Score65.1 | 54 | |
| Visual Reasoning | MMStar | Accuracy69 | 51 | |
| General Visual Reasoning | MMStar | Accuracy77.5 | 46 | |
| General Task | MMStar | Accuracy76.2 | 36 | |
| General Visual Question Answering | MMStar | Score77.8 | 35 | |
| General Reasoning | MMStar | Score69.2 | 32 | |
| Multimodal Understanding | MMStar | Average Score68.01 | 31 | |
| Visual Perception | MMStar | Accuracy73.07 | 30 | |
| Perception | MMStar latest (test) | CP67.2 | 30 | |
| Multimodal Reasoning | MMStar | Accuracy75.2 | 29 | |
| Multi-modal Visual Capability | MMStar | Score63.9 | 29 | |
| Multi-modal Reasoning | MMStar | Accuracy63.78 | 28 | |
| Image Reasoning | MMStar | Accuracy71.85 | 27 | |
| Multimodal Understanding | MMStar | Score68.33 | 26 | |
| General VQA | MMStar | Accuracy74.3 | 26 | |
| Multimodal Understanding | MMStar (test) | Accuracy71.6 | 26 | |
| Multimodal Reasoning | MMStar | Accuracy72.8 | 25 | |
| Vision-Language Perception and Reasoning | MMStar | Accuracy (MMStar)64.3 | 23 | |
| Visual Grounding | MMStar | Accuracy69.07 | 22 |