| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| 3D Visual Grounding | ScanRefer (val) | Overall Accuracy @ IoU 0.5066.83 | 155 | |
| 3D Dense Captioning | ScanRefer (val) | CIDEr106.11 | 91 | |
| Referring 3D Instance Segmentation | ScanRefer (val) | mIoU74.6 | 37 | |
| Visual Grounding | ScanRefer v1 (val) | Acc@0.5 (All)57 | 30 | |
| 3D Dense Captioning | ScanRefer (test) | CIDEr86.28 | 30 | |
| 3D Visual Grounding | ScanRefer Unique | Acc@0.25 (IoU=0.25)90.3 | 24 | |
| 3D Visual Grounding | ScanRefer | Acc@0.2566 | 23 | |
| 3D Visual Grounding | ScanRefer (test) | Unique Accuracy91.9 | 21 | |
| 3D Visual Grounding | ScanRefer Overall | Acc @ 0.2565.8 | 17 | |
| 3D Dense Captioning | ScanRefer | CIDEr@0.5IoU54.3 | 16 | |
| 3D Visual Grounding | ScanRefer Multiple (val) | Accuracy @ IoU 0.2552 | 15 | |
| 3D visual grounding | ScanRefer v1 (test) | Unique Acc@0.5IoU70.9 | 15 | |
| 3D Object Grounding | ScanRefer detected proposals v1 (val) | Unique Acc@0.2588.63 | 10 | |
| 3D Object Detection | ScanRefer (test) | mAP@0.553.95 | 10 | |
| Referring Expression Segmentation | ScanRefer | mIoU44.8 | 9 | |
| 3D Referring Expression Segmentation | ScanRefer Multiple | Acc@250.55 | 7 | |
| 3D Visual Grounding | ScanRefer 250 scenes (test) | Acc@0.25 (Unique)87.9 | 7 | |
| 3D Dense Captioning | ScanRefer Oracle DC | CIDEr87.09 | 7 | |
| 3D visual grounding | ScanRefer Box-Level | Accuracy @ IoU 0.2555.5 | 6 | |
| 3D Visual Grounding | ScanRefer ScanNet v2 (val) | Unique Acc93.4 | 5 | |
| 3D Object Grounding | ScanRefer ground-truth object proposals | Overall Grounding Accuracy59.8 | 4 | |
| 3D Visual Grounding | ScanRefer single-view RGBD | Acc@0.531.5 | 4 | |
| 3D Visual Grounding | ScanRefer whole scene | Acc@0.529 | 4 |