| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| 3D Visual Grounding | ScanRefer (val) | Overall Accuracy @ IoU 0.5066.83 | 192 | |
| 3D Visual Grounding | ScanRefer | Acc@0.558.2 | 142 | |
| 3D Dense Captioning | ScanRefer (val) | CIDEr106.11 | 91 | |
| 3D Visual Grounding | ScanRefer Unique | Acc@0.25 (IoU=0.25)90.3 | 41 | |
| 3D Visual Grounding | ScanRefer Overall | Acc @ 0.2571.6 | 41 | |
| Referring 3D Instance Segmentation | ScanRefer (val) | mIoU74.6 | 37 | |
| Visual Grounding | ScanRefer v1 (val) | Acc@0.5 (All)57 | 30 | |
| 3D Dense Captioning | ScanRefer (test) | CIDEr86.28 | 30 | |
| 3D Referring Expression Comprehension | ScanRefer | Overall@0.25 Accuracy58.47 | 21 | |
| 3D Visual Grounding | ScanRefer (test) | Unique Accuracy91.9 | 21 | |
| 3D Dense Captioning | ScanRefer | CIDEr@0.5IoU78.8 | 21 | |
| 3D Visual Grounding | ScanRefer Multiple | Accuracy @ IoU=0.2560.8 | 17 | |
| 3D Referring Expression Segmentation | ScanRefer | mIoU50.5 | 16 | |
| 3D Visual Grounding | ScanRefer Multiple (val) | Accuracy @ IoU 0.2552 | 15 | |
| 3D visual grounding | ScanRefer v1 (test) | Unique Acc@0.5IoU70.9 | 15 | |
| Visual Grounding | ScanRefer Overall category (test) | Accuracy61.1 | 13 | |
| 3D Object Grounding | ScanRefer detected proposals v1 (val) | Unique Acc@0.2588.63 | 10 | |
| 3D Object Detection | ScanRefer (test) | mAP@0.553.95 | 10 | |
| Referring Expression Segmentation | ScanRefer | mIoU44.8 | 9 | |
| Scene Retrieval | ScanRefer (n=10) | Recall@133.4 | 8 | |
| Scene Retrieval | ScanRefer (n=5) | Recall@122.4 | 8 | |
| 3D Referring Expression Segmentation (3DRES) | ScanRefer Multiple subset (val) | Overall Accuracy @0.2555.33 | 7 | |
| 3D Referring Expression Segmentation | ScanRefer Multiple | Acc@250.55 | 7 | |
| 3D Visual Grounding | ScanRefer 250 scenes (test) | Acc@0.25 (Unique)87.9 | 7 | |
| 3D Dense Captioning | ScanRefer Oracle DC | CIDEr87.09 | 7 |