| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| 3D Visual Grounding | Nr3D (test) | Overall Success Rate76.1 | 88 | |
| 3D Visual Grounding | Nr3D | Overall Success Rate69.9 | 83 | |
| 3D Dense Captioning | Nr3D 1 (val) | CIDEr (IoU=0.5)54.45 | 22 | |
| 3D Visual Grounding | Nr3D without GT object class | Easy Success72.5 | 13 | |
| 3D Visual Grounding | Nr3D (val) | Easy Score70.2 | 13 | |
| 3D Dense Captioning | Nr3D (test) | C Score @ 0.5 IoU59.48 | 13 | |
| 3D dense captioning | Nr3D | C Score (0.5 IoU)55.06 | 13 | |
| Oracle 3D Dense Captioning | Nr3D (val) | CIDEr85.4 | 10 | |
| 3DREC | NR3D | Accuracy (0.25 IoU)59.91 | 9 | |
| Scene Retrieval | Nr3D n=10 | R@130.7 | 8 | |
| Scene Retrieval | Nr3D n=5 | R@119.7 | 8 | |
| 3D Dense Captioning | Nr3D 1 (test) | CIDEr52.84 | 7 | |
| Viewpoint Grounding | Nr3D | Recall@125.2 | 6 | |
| 3D object grounding | Nr3D | Overall Accuracy (IoU=0.10)49.8 | 5 | |
| 3D Referring Expression Comprehension | NR3D constrained subset ReferIt3D (test) | Overall Accuracy52.6 | 5 | |
| Visual Grounding | Nr3D | Top-1 Accuracy66 | 3 | |
| 3D Scene Question Answering | Nr3D | Similarity Score50.6 | 3 | |
| 3D Referring Expression Segmentation | NR3D | Acc@0.2557.56 | 2 | |
| 3D Referring Expression Comprehension | NR3D | Accuracy @ IoU=0.2559.91 | 2 |