| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-modal Vision-Language Understanding | MMVet | Score81.3 | 38 | |
| Self-evaluation | MMVet | AUROC0.886 | 36 | |
| Multi-modal Understanding | MMVet | Accuracy76.2 | 35 | |
| Multi-modal Reasoning | MMVet (test) | Accuracy80.8 | 30 | |
| Multimodal Understanding | MMVet turbo | Accuracy74 | 28 | |
| Multimodal Understanding | MMVet v2 (0613) | Accuracy71.8 | 21 | |
| Multi-modal Vision-Language Evaluation | MMVet | Accuracy46.8 | 19 | |
| General Visual Question Answering | MMVet 2024b | Score66.8 | 13 | |
| Multimodal Understanding | MMVet | Pass@174.94 | 9 | |
| Pointwise Scoring | MMVet pointwise | Kendall's Tau0.974 | 9 | |
| Multimodal Comprehension | MMVet | Score58 | 8 | |
| General VQA | MMVet | Score66.8 | 8 | |
| General Visual Question Answering | MMVet turbo | Score76.2 | 7 | |
| Vision Understanding | MMVet v1.0 (test) | Score36.87 | 6 | |
| Universal multi-modal reasoning | MMVet | Pass@163.31 | 2 |