| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Spatial Reasoning | CV-Bench | Accuracy92 | 61 | |
| Spatial Reasoning | CV-Bench-3D | Accuracy96.3 | 32 | |
| Spatial Reasoning | CV-Bench 2D | Accuracy94 | 22 | |
| Computer Vision Perception | CV-Bench | Score89 | 22 | |
| Computer Vision Evaluation | CV-Bench | Average Score85.8 | 22 | |
| Vision-centric Evaluation | CV-Bench | Accuracy0.864 | 21 | |
| Vision-Language Evaluation | CV-Bench | Accuracy90.1 | 17 | |
| Vision-Centric Evaluation | CV-Bench 2D | Score63.8 | 15 | |
| Spatial Understanding | CV-Bench 2D Overall | Accuracy75.4 | 15 | |
| Single-image spatial reasoning | CV-Bench | 2D Accuracy80.7 | 15 | |
| Spatial VQA | CV-Bench-2D Relation (Level 2) | Accuracy96.9 | 14 | |
| Spatial Reasoning | CV-Bench (test) | 2D Score83.6 | 14 | |
| Multimodal Perception | CV-Bench | Accuracy89.57 | 13 | |
| Spatial Perception | CV-Bench Average | Accuracy85.5 | 12 | |
| Spatial Perception | CV-Bench 3D | Accuracy92.2 | 12 | |
| Spatial Perception | CV-Bench 2D | Accuracy (%)79.7 | 12 | |
| Vision-centric Reasoning | CV Bench | Accuracy83.8 | 12 | |
| Visual Understanding | CV-Bench | Accuracy86.96 | 12 | |
| Spatial Understanding | CV-Bench 3D (test) | Average Score91.3 | 11 | |
| Spatial Reasoning | CV-Bench SI 58 | Accuracy81.1 | 11 | |
| Spatial Understanding | CV-Bench v1 (test) | Relational Score94 | 11 | |
| Spatial Relationship Understanding | CV-Bench | 2D Relational Score93.85 | 9 | |
| Multi-modal Reasoning | CV-Bench | Overall Accuracy86.5 | 6 | |
| Spatial Reasoning | CV-Bench | Average Spatial Score75.6 | 5 | |
| General VQA | CV-Bench | Accuracy90.07 | 5 |