| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| 3D Question Answering | SQA3D (test) | EM@164.8 | 98 | |
| 3D Question Answering | SQA3D | EM61.3 | 69 | |
| 3D Situated Question Answering | SQA3D (test) | Average Accuracy54.57 | 40 | |
| 3D Question Answering | SQA3D | Exact Match (EM)60.6 | 21 | |
| Situated 3D Question Answering | SQA3D | EM62.3 | 18 | |
| 3D Scene Understanding | SQA3D | EM-160.7 | 14 | |
| Vision-Language Reasoning | SQA3D ScanNet scenes (test) | BLEU-154.9 | 13 | |
| Situated 3D Question Answering | SQA3D (test) | EM@163 | 12 | |
| Multimodal Compositional QA | SQA3D (test) | Accuracy41 | 9 | |
| 3D Situated Question Answering | SQA3D | What Accuracy47.7 | 9 | |
| Language-based Localization | SQA3D (test) | Accuracy @ 0.5m42.6 | 8 | |
| 3D Question Answering | SQA3D v1.0 (test) | EM@159 | 8 | |
| 3D Visual Question Answering | SQA3D | EM@152.5 | 8 | |
| Situated Question Answering | SQA3D (test) | EM54.6 | 7 | |
| Orientation | SQA3D 1.0 (test) | Acc @ 15°28.7 | 5 | |
| Localization | SQA3D 1.0 (test) | Accuracy @ 0.5m0.274 | 5 | |
| QA-driven Grounding | SQA3D | F1@5051.6 | 3 | |
| Question Answering | SQA3D | EM-R39.7 | 3 | |
| Visual Grounding | SQA3D-G | F1@5051.6 | 2 |