| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Understanding | MMMU | Accuracy81.8 | 437 | |
| Multi-discipline Multimodal Understanding | MMMU | Accuracy84.2 | 317 | |
| Multi-discipline Multimodal Understanding | MMMU (val) | Accuracy81.7 | 204 | |
| Massive Multi-discipline Multimodal Understanding | MMMU | Accuracy65.5 | 152 | |
| Multimodal Understanding | MMMU (val) | MMMU Score85.2 | 152 | |
| Multimodal Reasoning | MMMU (val) | Accuracy78.2 | 144 | |
| Multimodal Reasoning | MMMU | Accuracy83.89 | 130 | |
| Multimodal Understanding | MMMU (test) | MMMU Score69.6 | 112 | |
| Multimodal Reasoning | MMMU Pro | Accuracy85.6 | 107 | |
| Multimodal Understanding | MMMU | MMMU Score62.5 | 78 | |
| Multimodal Understanding | MMMU | MMMU Score60.74 | 69 | |
| Multi-discipline Multimodal Understanding | MMMU Pro | Accuracy67.3 | 66 | |
| Vision Understanding | MMMU | Accuracy72.9 | 65 | |
| Multimodal Understanding | MMMU | MMMU Score81.8 | 59 | |
| Multi-agent discussion attack | MMMU | Delta Accuracy2.3 | 48 | |
| Video reasoning | Video-MMMU | Accuracy84.6 | 45 | |
| Medical Visual Question Answering | MMMU Health & Medicine (test) | Accuracy74.5 | 39 | |
| Multimodal Understanding | MMMU | Accuracy56.8 | 38 | |
| Visual Question Answering | MMMU (val) | Accuracy69.1 | 38 | |
| Visual Question Answering | MMMU | Accuracy81.7 | 37 | |
| Multimodal Reasoning | MMMU (test) | Accuracy64.7 | 34 | |
| Multi-discipline Reasoning | MMMU | Accuracy36.1 | 34 | |
| Multi-discipline Multimodal Reasoning | MMMU | Accuracy61.3 | 33 | |
| Over-refusal evaluation | MMMU in-scope (test) | Math Score37 | 32 | |
| General Reasoning | MMMU | Overall Score75.4 | 32 |