| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Understanding | MM-Vet | MM-Vet Score82.2 | 531 | |
| Multimodal Reasoning | MM-Vet | MM-Vet Score86.2 | 431 | |
| Multimodal Capability Evaluation | MM-Vet | Score85.6 | 345 | |
| Multimodal Evaluation | MM-Vet | Score64 | 180 | |
| Visual Understanding | MM-Vet | MM-Vet Score76.9 | 142 | |
| Multimodal Understanding | MM-VET (test) | Total Score67.6 | 120 | |
| Large Multimodal Model Evaluation | MM-Vet | Average Score54.5 | 61 | |
| Vision-Language Understanding | MM-Vet | Total Score72.16 | 43 | |
| Visual Reasoning | MM-Vet | Score82.7 | 40 | |
| Visual Question Answering | MM-Vet | MM-Vet ASR Accuracy75.8 | 33 | |
| Multimodal Question Answering | MM-Vet | Total Score68.3 | 24 | |
| Multimodal Understanding | MM-Vet v2 | MM-Vet v2 Score71.8 | 23 | |
| Visual Reasoning and Instruction Following | MM-Vet | Overall Score75.2 | 23 | |
| Multi-modal Reasoning and Understanding | MM-Vet | Accuracy74.6 | 20 | |
| Multi-modal understanding | MM-Vet | Rec46.9 | 19 | |
| Multimodal Reasoning | MM-Vet | Pass@1 Accuracy76.2 | 16 | |
| Multimodal Understanding | MM-Vet OOD | Accuracy96.33 | 14 | |
| Malicious Prompt Detection | MM-Vet OOD | FPR3.67 | 14 | |
| Multimodal Understanding | MM-Vet | Relative Speed (RelSpd)193.2 | 13 | |
| Multimodal Reasoning and Tool-use | MM-Vet | MM-Vet Tool-use Score44.7 | 13 | |
| Multimodal Capability Evaluation | MM-Vet 58 | Score38.2 | 13 | |
| Multimodal Understanding | MM-Vet benign queries | Recognition Score54.5 | 12 | |
| General Evaluation | MM-VET | REC39.5 | 12 | |
| Multimodal Understanding | MM-Vet (5% Forget Set) | Average Score43.9 | 12 | |
| Multimodal Evaluation | MM-Vet v2 | Score81.6 | 10 |