| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| 3D Object Detection | EmbodiedScan | AP@0.2534.28 | 13 | |
| 3D Video Object Detection | EmbodiedScan v1.0 (test) | P@0.2554.2 | 10 | |
| 3D Object Detection | EmbodiedScan | AP (Large-Vocabulary, IoU=0.25)0.1907 | 9 | |
| Semantic Occupancy Prediction | EmbodiedScan (test) | mIoU20.79 | 8 | |
| 3D Visual Grounding | EmbodiedScan ARKitScenes | Accuracy @ IoU 0.2528.7 | 7 | |
| 3D Visual Grounding | EmbodiedScan official (val) | AP@0.25 (Easy)41.66 | 7 | |
| Multi-view 3D Visual Grounding | EmbodiedScan | Overall Performance36.88 | 5 | |
| Multi-view Semantic Occupancy Prediction | EmbodiedScan | mIoU27.45 | 5 | |
| 3D Visual Grounding | EmbodiedScan (test) | Overall AP2525.72 | 4 |