| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Spatial Reasoning | VSI-Bench | Avg Score79.2 | 255 | |
| Spatial Reasoning | VSI-Bench 1.0 (test) | Average Score79.2 | 101 | |
| Visual Spatial Intelligence | VSI-Bench | Average Score79.2 | 60 | |
| 3D Question Answering | VSI-Bench | Room Size Accuracy67.1 | 56 | |
| Video Reasoning | VSI-Bench | Accuracy43.1 | 51 | |
| Video spatial reasoning | VSI-Bench | Average Score79.2 | 45 | |
| Spatial Reasoning | VSI-Bench tiny | Avg Score51.61 | 39 | |
| Spatial Reasoning | VSI-Bench (test) | Avg Score79.2 | 31 | |
| Spatial Reasoning (Video) | VSI-Bench | Accuracy79.2 | 30 | |
| Video Reasoning | VSI-Bench (test) | Accuracy38.6 | 29 | |
| Video Visual Question Answering | VSI-Bench | ACC (MCA)68.5 | 28 | |
| Video Understanding | VSI-Bench | Accuracy49.5 | 23 | |
| Visual Spatial Understanding | VSI-Bench static 1.0 | Average Score89.5 | 20 | |
| Spatial Reasoning | VSI-Bench Vanilla regime | Avg Score50.5 | 19 | |
| Multi-view global spatial scene understanding | VSI-Bench Standard | Relative Distance Accuracy70.5 | 17 | |
| Multi-view global spatial scene understanding | VSI-Bench | Relative Distance Score94.7 | 16 | |
| Spatial Reasoning | VSI-Bench 59 (test) | Object Count Score72.5 | 14 | |
| Fine-grained video-based spatial reasoning | VSI-Bench | Avg Score60.6 | 13 | |
| Video Spatial Intelligence | VSI-Bench 123 (test) | Object Count70 | 13 | |
| Visual Spatial Inference | VSI-Bench Tiny video-input | Object Count Score69 | 12 | |
| Video Understanding | VSI-Bench 67 (test) | Average Score52.7 | 12 | |
| Spatial Reasoning | VSI-Bench MV 76 | Accuracy63.7 | 11 | |
| multi-view Visual Question Answering | VSI-Bench (test) | Average Score52.9 | 11 | |
| Video Scene Identification | VSI-Bench | Accuracy38.2 | 10 | |
| Spatial Reasoning | VSI-Bench Extra | RDB (Back)72 | 9 |