| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Perception | MMVP | Accuracy69.67 | 47 | |
| Multimodal Visual Perception | MMVP | Accuracy85.33 | 44 | |
| Visual Question Answering | MMVP | Accuracy53.33 | 21 | |
| Visual Reasoning | MMVP | Accuracy86.3 | 19 | |
| Multimodal Visual Pattern Understanding | MMVP | Accuracy80.33 | 16 | |
| Spatial Understanding | MMVP | Accuracy77 | 15 | |
| Fine-Grained Perception | MMVP | Accuracy74.67 | 14 | |
| Hallucination | MMVP | Accuracy72.1 | 13 | |
| Vision-Centric Evaluation | MMVP | Score65.2 | 12 | |
| Vision Understanding | MMVP | Accuracy69.3 | 12 | |
| Multimodal Visual Pattern Understanding | MMVP-VLM (test) | Orientation & Direction Acc0.267 | 12 | |
| Fine-grained Perception | MMVP (test) | MMVP Score75.33 | 11 | |
| Perception | MMVP (test) | Accuracy68.7 | 11 | |
| Fine-grained Visual Pattern Recognition | MMVP-VLM | Orientation Score60 | 11 | |
| Vision-centric Reasoning | MMVP | Accuracy78.33 | 10 | |
| Multimodal Multi-choice | MMVP | Accuracy75.3 | 10 | |
| Visual Question Answering | MMVP-VLM | Orientation & Direction Score26.7 | 10 | |
| Visual-centric Reasoning | MMVP | Average Score28.9 | 9 | |
| Visual Perception & Contextual Understanding | MMVP-VLM | Average Score25.9 | 7 | |
| Multimodal Reasoning | MMVP (test) | UPR0.118 | 6 | |
| Visual Perception | MMVP standard (test) | MMVP Score80.2 | 6 | |
| Pose and Translation Estimation | MMVP 1.0 (test) | MPJPE83 | 6 | |
| Image Classification | MMVP | Average Score37 | 5 | |
| General visual question answering | MMVP | Pass@170.7 | 5 | |
| Visual Pattern Recognition | MMVP-VLM | VP1 Accuracy26.7 | 5 |