| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| 3D Object Detection | EmbodiedScan | AP@0.2534.28 | 13 | |
| 3D Video Object Detection | EmbodiedScan v1.0 (test) | P@0.2554.2 | 10 | |
| 3D Object Detection | EmbodiedScan | AP (Large-Vocabulary, IoU=0.25)0.1907 | 9 | |
| 3D Visual Grounding | EmbodiedScan (Full) | Overall AP@2562.18 | 8 | |
| Semantic Occupancy Prediction | EmbodiedScan (test) | mIoU20.79 | 8 | |
| 3D Visual Grounding | EmbodiedScan ARKitScenes | Accuracy @ IoU 0.2528.7 | 7 | |
| 3D Visual Grounding | EmbodiedScan official (val) | AP@0.25 (Easy)41.66 | 7 | |
| 3D Detection | EmbodiedScan | Overall AP@2524.68 | 6 | |
| 3D Visual Grounding | EmbodiedScan (test) | Overall AP5042.04 | 5 | |
| Multi-view 3D Visual Grounding | EmbodiedScan | Overall Performance36.88 | 5 | |
| Multi-view Semantic Occupancy Prediction | EmbodiedScan | mIoU27.45 | 5 | |
| 3D Visual Grounding | EmbodiedScan Mini | Overall AP@2561.28 | 4 | |
| 3D Visual Grounding | EmbodiedScan (test) | Overall AP2525.72 | 4 |