| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | SAT-Math | SAT Math Accuracy97.66 | 47 | |
| Spatial Mental Modeling | SAT-Real | AVG93.9 | 41 | |
| Image Classification | SAT-6 (test) | Accuracy99.84 | 21 | |
| Mathematical Reasoning | SAT | Accuracy98.2 | 18 | |
| Reasoning | SAT | Accuracy (SAT)97.6 | 17 | |
| Spatial Aptitude | SAT | Accuracy92 | 17 | |
| Data Contamination Detection | SAT | F1 Score79 | 16 | |
| Image Classification | SAT6 | Accuracy96.75 | 16 | |
| Spatial Reasoning | SAT Real | Accuracy (Pass@1)68.67 | 15 | |
| Off-policy evaluation for classification error | sat | Bias-0.007 | 15 | |
| Spatial Mental Modeling | SAT (synthesized) | EgoM95.4 | 15 | |
| Analogy recognition | SAT | Accuracy60.78 | 15 | |
| Visual Question Answering | SAT Real | Accuracy84.1 | 13 | |
| Spatial Reasoning | SAT | Val Metric Score87.7 | 12 | |
| Visual Understanding | SAT | Accuracy73.3 | 11 | |
| Spatial Reasoning | SAT | Overall Acc80 | 11 | |
| Spatial Reasoning | SAT ood (test) | Accuracy79.7 | 11 | |
| Analogy Generation | SAT (test) | Accuracy91 | 11 | |
| Analogy Generation | SAT | Accuracy0.91 | 11 | |
| Spatial Understanding | SAT | Score88 | 10 | |
| 3D/4D Video Question Answering | SAT | Accuracy64.8 | 8 | |
| Spatial Reasoning | SAT iid (val) | Accuracy92.7 | 8 | |
| Spatial Reasoning | SAT (test) | Accuracy75.33 | 7 | |
| Spatial Reasoning | SAT (val) | Accuracy93.48 | 7 | |
| 3D Task | SAT | Accuracy75.33 | 7 |