| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| High-resolution perception | V* | Overall Score89.53 | 20 | |
| Vision-Intensive Perception | V* Benchmark | Attr Score84.4 | 18 | |
| Semantic Segmentation | V20 | mIoU83.8 | 15 | |
| Visual Reasoning | V* cross-domain (test) | Accuracy79.06 | 15 | |
| Visual Reasoning | V* | Accuracy81.15 | 14 | |
| Fine-grained Perception | V* | Accuracy78.8 | 13 | |
| Visual Perception | V* | Score89 | 12 | |
| Visual Search | V* | Average Success90.6 | 11 | |
| Visual Reasoning | V* (test) | Overall Score92.2 | 11 | |
| Perception | V* (test) | Accuracy86.9 | 11 | |
| Visual Reasoning | V* | Overall Score95.7 | 10 | |
| Visual Question Answering | V* | Accuracy49.73 | 10 | |
| Visual Search | V* bench (test) | Attribute Rate87 | 10 | |
| Fine-grained Visual Reasoning | V* | Accuracy89 | 8 | |
| Multimodal Multi-choice | V* | Accuracy84.3 | 8 | |
| Visual Search and Comprehension | V* | Accuracy89.8 | 8 | |
| Multimodal reasoning | V* | Pass@189.5 | 7 | |
| Visual Search | V* benchmark | Attribute Success Rate75.65 | 5 |