| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Spatial Reasoning | MMSI-Bench | Accuracy97.2 | 52 | |
| Spatial Reasoning | MMSI-Bench | Average Accuracy49.5 | 32 | |
| Spatial Reasoning (Multi-Image) | MMSI-Bench | Accuracy45.2 | 29 | |
| Spatial Reasoning | MMSI-Bench (test) | PR Score52.8 | 29 | |
| Spatial and Temporal Reasoning | MMSI-Bench (test) | Cam-Cam Accuracy43 | 25 | |
| Spatial Reasoning | MMSI-Bench MindJourney Subset (162 questions) (test) | Accuracy0.358 | 19 | |
| Image Understanding | MMSI-Bench 68 (test) | Average Score69.4 | 12 | |
| 3D/4D Video Question Answering | MMSI-Bench | Accuracy40.7 | 12 | |
| Spatial Reasoning | MMSI-Bench 79 (MV) | Accuracy43.3 | 11 | |
| Multi-image Understanding | MMSI-Bench | Accuracy36.9 | 6 | |
| Spatial Understanding | MMSI-Bench | Accuracy33.2 | 5 | |
| Spatial Understanding | MMSI-Bench | Score48 | 5 | |
| 3D/4D Visual Question Answering | MMSI Bench 1.0 (test) | Avg Multiple Choice Accuracy33.3 | 4 | |
| Visual Spatial Reasoning | MMSI-Bench | Cam-Cam Accuracy36.6 | 4 |