| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| 3D Visual Grounding | ScanRefer (val) | Overall Accuracy @ IoU 0.5066.83 | 253 | |
| 3D Visual Grounding | ScanRefer | Acc@0.558.2 | 142 | |
| 3D visual grounding | ScanRefer v1 (test) | Acc@0.25IoU58.89 | 96 | |
| 3D Dense Captioning | ScanRefer (val) | CIDEr106.11 | 91 | |
| 3D Visual Grounding | ScanRefer Unique | Acc@0.25 (IoU=0.25)90.3 | 41 | |
| 3D Visual Grounding | ScanRefer Overall | Acc @ 0.2571.6 | 41 | |
| Referring 3D Instance Segmentation | ScanRefer (val) | mIoU74.6 | 37 | |
| Visual Grounding | ScanRefer v1 (val) | Acc@0.5 (Unique)84 | 35 | |
| 3D Dense Captioning | ScanRefer (test) | CIDEr86.28 | 30 | |
| 3D Referring Expression Segmentation | ScanRefer | Accuracy @ 0.2562 | 25 | |
| 3D Referring Expression Comprehension | ScanRefer | Overall@0.25 Accuracy58.47 | 21 | |
| 3D Visual Grounding | ScanRefer (test) | Unique Accuracy91.9 | 21 | |
| 3D Dense Captioning | ScanRefer | CIDEr@0.5IoU78.8 | 21 | |
| 3D Visual Grounding | ScanRefer Multiple | Accuracy @ IoU=0.2560.8 | 17 | |
| 3D Visual Grounding | ScanRefer Multiple (val) | Accuracy @ IoU 0.2552 | 15 | |
| Dense Captioning | ScanRefer | Caption Score (0.5 IoU)84.1 | 13 | |
| Visual Grounding | ScanRefer Overall category (test) | Accuracy61.1 | 13 | |
| 3D box localization | ScanRefer | Accuracy @ 0.25 IoU62.6 | 11 | |
| 3D Object Grounding | ScanRefer detected proposals v1 (val) | Unique Acc@0.2588.63 | 10 | |
| 3D Object Detection | ScanRefer (test) | mAP@0.553.95 | 10 | |
| Referring Expression Segmentation | ScanRefer | mIoU44.8 | 9 | |
| Scene Retrieval | ScanRefer (n=10) | Recall@133.4 | 8 | |
| Scene Retrieval | ScanRefer (n=5) | Recall@122.4 | 8 | |
| 3D Referring Expression Segmentation (3DRES) | ScanRefer Multiple subset (val) | Overall Accuracy @0.2555.33 | 7 | |
| 3D Referring Expression Segmentation | ScanRefer Multiple | Acc@250.55 | 7 |