| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Self-evaluation | CVBench | AUROC0.747 | 36 | |
| Vision Understanding | CVBench-2D | Accuracy77.76 | 22 | |
| Spatial Reasoning | CVBench | 2D Relationship Score96.31 | 15 | |
| Vision | CVBench | CVBench Score77.2 | 13 | |
| Spatial Reasoning | CVBench | Count Score70.6 | 12 | |
| Vision-Language Reasoning | CVBench | Accuracy86.16 | 12 | |
| Perception | CVBench (test) | Accuracy87.6 | 11 | |
| Video Understanding | CVBench 83 (test) | Average Score69.1 | 10 | |
| 3D Task | CVBench | Accuracy84.52 | 7 | |
| Visual Question Answering | CVBench 3D | Accuracy87 | 7 | |
| Visual Question Answering | CVBench 2D | Accuracy77.7 | 7 | |
| Computer Vision Benchmarking | CVBench | Accuracy83 | 6 | |
| Visual Perception | CVBench | 2D Score76.6 | 5 | |
| Spatial reasoning | CVBench | Score57.3 | 4 | |
| Complex Reasoning | cvbench | Accuracy54.9 | 4 | |
| 2D Vision Reasoning | CVBench 2D | Accuracy58.5 | 4 | |
| Counting | CVBench | Counting Score73.8 | 3 |