| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| General VQA | MMVet | Score83.9 | 63 | |
| Multi-modal Understanding | MMVet | Accuracy85.67 | 55 | |
| Multi-modal Reasoning | MMVet (test) | Accuracy80.8 | 49 | |
| Multi-modal Vision-Language Evaluation | MMVet | Accuracy46.8 | 38 | |
| Multi-modal Vision-Language Understanding | MMVet | Score81.3 | 38 | |
| Self-evaluation | MMVet | AUROC0.886 | 36 | |
| Multimodal Understanding | MMVet turbo | Accuracy74 | 28 | |
| Multimodal Understanding | MMVet v2 (0613) | Accuracy71.8 | 21 | |
| Multimodal Reasoning | MMVet v1 (val) | Accuracy33.7 | 19 | |
| Multi-modal Reasoning | MMVet | Score49.2 | 18 | |
| General Multimodal Evaluation | MMVet turbo | Overall Score69.7 | 16 | |
| Visual Question Answering | MMVet (test) | Score67.1 | 16 | |
| Visual Language Model Evaluation | MMVet V2 | MMVet V2 Score52.6 | 15 | |
| Multimodal Understanding | MMVet | MMVet Score67.2 | 15 | |
| General Visual Question Answering | MMVet 2024b | Score66.8 | 13 | |
| User Preference & Fluency | MMVet | MMVet User Preference Score41.5 | 10 | |
| Multimodal Reasoning | MMVet | Token Length3,296.8 | 9 | |
| Multimodal Understanding | MMVet | Pass@174.94 | 9 | |
| Pointwise Scoring | MMVet pointwise | Kendall's Tau0.974 | 9 | |
| Multimodal Comprehension | MMVet | Score58 | 8 | |
| Visual Language Model Evaluation | MMVet | MMVet Score40.6 | 7 | |
| General Visual Question Answering | MMVet turbo | Score76.2 | 7 | |
| Vision Understanding | MMVet v1.0 (test) | Score36.87 | 6 | |
| Vision-language capability | MMVet | Score81.2 | 5 | |
| Multimodal Understanding | MMVet | Gain Score5.97 | 4 |