| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Understanding | MMMU | Accuracy81.8 | 437 | |
| Multi-discipline Multimodal Understanding | MMMU | Accuracy84.2 | 363 | |
| Multimodal Understanding | MMMU | MMMU Score67.8 | 232 | |
| Massive Multi-discipline Multimodal Understanding | MMMU | Accuracy65.5 | 216 | |
| Multi-discipline Multimodal Understanding | MMMU (val) | Accuracy81.7 | 212 | |
| Multimodal Reasoning | MMMU | Accuracy83.89 | 208 | |
| Multimodal Understanding | MMMU (val) | MMMU Score85.2 | 199 | |
| Multimodal Reasoning | MMMU (val) | Accuracy78.2 | 168 | |
| Multimodal Reasoning | MMMU Pro | Accuracy85.6 | 146 | |
| Multimodal Understanding | MMMU (test) | MMMU Score69.6 | 112 | |
| Multimodal Understanding | MMMU | MMMU Score81.8 | 102 | |
| Multi-modal Question Answering | MMMU | Accuracy82.3 | 83 | |
| Multimodal Understanding | MMMU | Accuracy59.63 | 76 | |
| Multimodal Understanding | MMMU | MMMU Score60.74 | 69 | |
| Video reasoning | Video-MMMU | Accuracy84.6 | 68 | |
| Multi-discipline Multimodal Understanding | MMMU Pro | Accuracy67.3 | 66 | |
| Vision Understanding | MMMU | Accuracy72.9 | 65 | |
| Visual Question Answering | MMMU | Accuracy81.7 | 54 | |
| Multimodal Understanding | MMMU | Accuracy (MMMU)58 | 52 | |
| Multi-agent discussion attack | MMMU | Delta Accuracy2.3 | 48 | |
| General Reasoning | MMMU | Overall Score75.4 | 48 | |
| Multimodal Reasoning | MMMU | Accuracy85.79 | 40 | |
| Medical Visual Question Answering | MMMU Health & Medicine (test) | Accuracy74.5 | 39 | |
| Multimodal Understanding | MMMU | Accuracy56.8 | 38 | |
| Multi-discipline reasoning | MMMU (val) | Accuracy81.8 | 38 |