| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| High-resolution perception | V* | Overall Score89.53 | 55 | |
| Visual Search | V* benchmark | Overall Success Rate91.1 | 54 | |
| Visual Reasoning | V* | Accuracy92.7 | 52 | |
| Visual Question Answering | V* | Accuracy74.35 | 45 | |
| Visual Perception | V* | Score89 | 42 | |
| Visual Perception and Reasoning | V* | Overall Accuracy90.1 | 36 | |
| Visual Grounding | V* | Accuracy83.77 | 29 | |
| Visual Search | V* | Accuracy90.1 | 28 | |
| Visual Reasoning | V* | Overall Score95.7 | 22 | |
| Visual Perception | V* v1.0 (test) | Score84.35 | 20 | |
| Fine-grained VQA | V* | Accuracy93.2 | 18 | |
| Vision-Intensive Perception | V* Benchmark | Attr Score84.4 | 18 | |
| Reasoning | V* | Pass@497.9 | 16 | |
| Perception | V* | Pass@190.2 | 16 | |
| Multimodal Reasoning | V* | Accuracy87 | 16 | |
| Pixel-centric Understanding | V* | Score72.7 | 15 | |
| Semantic Segmentation | V20 | mIoU83.8 | 15 | |
| Visual Reasoning | V* cross-domain (test) | Accuracy79.06 | 15 | |
| Fine-grained visual search | V* | Overall Score91.1 | 14 | |
| Perception | V* | Overall Score95.7 | 13 | |
| High-resolution Visual Search | V* | Top-1 Accuracy86.91 | 13 | |
| Fine-grained visual reasoning | V* | Avg@8 Overall89.5 | 13 | |
| Visual Grounding | V* Relative Position 52 | Accuracy89.47 | 13 | |
| Visual Grounding | V* Direct Attributes 52 | Accuracy90.43 | 13 | |
| High-resolution Multi-modal Understanding | V* | Accuracy80.23 | 13 |