| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Understanding | MMMU | Accuracy69.9 | 275 | |
| Multi-discipline Multimodal Understanding | MMMU | Accuracy84.2 | 266 | |
| Multi-discipline Multimodal Understanding | MMMU (val) | Accuracy81.7 | 167 | |
| Multimodal Reasoning | MMMU (val) | Accuracy75 | 114 | |
| Multimodal Understanding | MMMU (val) | MMMU Score85.2 | 111 | |
| Massive Multi-discipline Multimodal Understanding | MMMU | Accuracy65.5 | 88 | |
| Multimodal Understanding | MMMU (test) | MMMU Score69.6 | 86 | |
| Multimodal Understanding | MMMU | MMMU Score62.5 | 78 | |
| Multi-discipline Multimodal Understanding | MMMU Pro | Accuracy67.3 | 56 | |
| Multimodal Reasoning | MMMU Pro | Accuracy76.96 | 55 | |
| Multimodal Reasoning | MMMU | Accuracy83.89 | 44 | |
| Over-refusal evaluation | MMMU in-scope (test) | Math Score37 | 32 | |
| General Reasoning | MMMU | Overall Score75.4 | 32 | |
| Video reasoning | Video-MMMU | Accuracy84.6 | 32 | |
| Multimodal Reasoning | MMMU (test) | Accuracy64.7 | 30 | |
| Visual Question Answering | MMMU (val) | Accuracy69.1 | 29 | |
| Vision Understanding | MMMU | Overall Score67.4 | 28 | |
| Multimodal Understanding | MMMU zero-shot | Zero-shot Accuracy73.67 | 26 | |
| Medical Visual Question Answering | MMMU H&M | Accuracy0.7875 | 25 | |
| Multimodal Reasoning | MMMU-Pro | Std-10 Score55 | 25 | |
| Video Understanding | Video-MMMU | Accuracy87.6 | 23 | |
| Multi-discipline Multimodal Understanding | MMMU F | Accuracy45.9 | 23 | |
| Multidisciplinary Knowledge | MMMU | Score69.1 | 21 | |
| Multimodal Understanding | MMMU Pro | Vis Accuracy55.7 | 20 | |
| Multi-Discipline Reasoning | MMMU-Pro | Pass@151.8 | 19 |