| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Understanding | MMStar | Accuracy82 | 197 | |
| Multimodal Reasoning | MMStar | Accuracy82 | 81 | |
| Visual Question Answering | MMStar | Accuracy82.96 | 57 | |
| Multimodal Evaluation | MMStar | Accuracy65.8 | 46 | |
| General Reasoning | MMStar | Score69.2 | 32 | |
| Perception | MMStar latest (test) | CP67.2 | 30 | |
| General Visual Reasoning | MMStar | Accuracy77.5 | 29 | |
| Multimodal Reasoning | MMStar | Accuracy75.2 | 29 | |
| Multimodal Understanding | MMStar | Average Score68.01 | 21 | |
| Multi-modal Visual Capability | MMStar | Score63.9 | 20 | |
| Visual Perception | MMStar | Accuracy65.7 | 20 | |
| Mathematical Reasoning | MMStar Math | Accuracy77.2 | 19 | |
| Image Understanding | MMStar | Score62.49 | 16 | |
| Multimodal Perception | MMStar | Accuracy83.6 | 16 | |
| Multimodal Multi-choice | MMStar | Accuracy65.1 | 15 | |
| Multidisciplinary Knowledge | MMStar | Score68.2 | 15 | |
| High-quality Vision-Language Evaluation | MMStar | Score68.2 | 14 | |
| General Visual Question Answering | MMStar 2024b | Accuracy65.3 | 14 | |
| General Visual Question Answering | MMStar | Score77.8 | 14 | |
| Multimodal Reasoning | MMStar (test) | Score60.4 | 12 | |
| Multimodal Understanding | MMStar (5% Forget Set) | Average Score49.27 | 12 | |
| General image understanding | MMStar | Accuracy57.73 | 11 | |
| General Perception and Reasoning | MMStar | Score68.8 | 11 | |
| Multimodal Robustness | MMStar (test) | MMStar Score69.8 | 11 | |
| Perception | MMStar (test) | Accuracy72.3 | 11 |