| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Understanding | MM-Vet | MM-Vet Score82.2 | 418 | |
| Multimodal Capability Evaluation | MM-Vet | Score85.6 | 282 | |
| Multimodal Reasoning | MM-Vet | MM-Vet Score80.8 | 281 | |
| Multimodal Evaluation | MM-Vet | Accuracy85.6 | 122 | |
| Multimodal Understanding | MM-VET (test) | Total Score67.6 | 114 | |
| Visual Understanding | MM-Vet | MM-Vet Score76.9 | 102 | |
| Large Multimodal Model Evaluation | MM-Vet | Average Score54.5 | 58 | |
| Vision-Language Understanding | MM-Vet | Total Score72.16 | 43 | |
| Visual Reasoning | MM-Vet | Score82.7 | 34 | |
| Visual Question Answering | MM-Vet | MM-Vet ASR Accuracy73.1 | 27 | |
| Multimodal Question Answering | MM-Vet | Total Score68.3 | 24 | |
| Visual Reasoning and Instruction Following | MM-Vet | Overall Score75.2 | 23 | |
| Multimodal Reasoning and Tool-use | MM-Vet | MM-Vet Tool-use Score44.7 | 13 | |
| Multimodal Capability Evaluation | MM-Vet 58 | Score38.2 | 13 | |
| General Evaluation | MM-VET | REC39.5 | 12 | |
| Multi-modal Reasoning and Understanding | MM-Vet | Accuracy53.4 | 12 | |
| Multimodal Understanding | MM-Vet (5% Forget Set) | Average Score43.9 | 12 | |
| Multi-modal understanding | MM-Vet | Rec46.9 | 11 | |
| Multimodal Understanding | MM-Vet v2 | MM-Vet v2 Score71.8 | 11 | |
| Multimodal Understanding | MM-Vet | Average Accepted Length (tau)3.82 | 10 | |
| Conversational Visual QA | MM-Vet (test) | MM-Vet Score52.8 | 10 | |
| General Visual Question Answering | MM-Vet | Accuracy69.1 | 10 | |
| 3D Multimodal Comprehension | 3D MM-Vet (test) | Recognition Accuracy65.1 | 9 | |
| Multimodal Utility Evaluation | MM-Vet benign | FRR3.21 | 8 | |
| Multimodal Reasoning | MM-Vet | Accuracy36.2 | 8 |