| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Understanding | MM-Vet | MM-Vet Score82.2 | 631 | |
| Multimodal Reasoning | MM-Vet | MM-Vet Score86.2 | 517 | |
| Multimodal Capability Evaluation | MM-Vet | Score85.6 | 393 | |
| Multimodal Evaluation | MM-Vet | Score64 | 196 | |
| Visual Understanding | MM-Vet | MM-Vet Score76.9 | 167 | |
| Multimodal Understanding | MM-VET (test) | Total Score67.6 | 120 | |
| Large Multimodal Model Evaluation | MM-Vet | Average Score54.5 | 69 | |
| Multimodal Evaluation | MM-Vet v2 | Score81.6 | 46 | |
| Vision-Language Understanding | MM-Vet | Total Score72.16 | 43 | |
| Visual Reasoning | MM-Vet | Score82.7 | 40 | |
| Visual Question Answering | MM-Vet | MM-Vet ASR Accuracy75.8 | 33 | |
| Multimodal Question Answering | MM-Vet | Total Score68.3 | 24 | |
| Multimodal Understanding | MM-Vet v2 | MM-Vet v2 Score71.8 | 23 | |
| Visual Reasoning and Instruction Following | MM-Vet | Overall Score75.2 | 23 | |
| Multi-modal Reasoning and Understanding | MM-Vet | Accuracy74.6 | 20 | |
| Multi-modal understanding | MM-Vet | Rec46.9 | 19 | |
| Multimodal Judgment | MM-Vet | Overall Score37.4 | 16 | |
| Multi-modal Understanding | MM-Vet v1 (full) | Overall Score (MM-Vet v1)36.2 | 16 | |
| Multimodal Reasoning | MM-Vet | Pass@1 Accuracy76.2 | 16 | |
| Open-ended generation | MM-Vet | MM-Vet Score45.55 | 14 | |
| Multimodal Understanding | MM-Vet OOD | Accuracy96.33 | 14 | |
| Malicious Prompt Detection | MM-Vet OOD | FPR3.67 | 14 | |
| Multimodal Question Answering | MM-Vet (test) | Accuracy70.3 | 13 | |
| Multimodal Understanding | MM-Vet | Relative Speed (RelSpd)193.2 | 13 | |
| Multimodal Reasoning and Tool-use | MM-Vet | MM-Vet Tool-use Score44.7 | 13 |