Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

About

We introduce the task of 3D object localization in RGB-D scans using natural language descriptions. As input, we assume a point cloud of a scanned 3D scene along with a free-form description of a specified target object. To address this task, we propose ScanRefer, learning a fused descriptor from 3D object proposals and encoded sentence embeddings. This fused descriptor correlates language expressions with geometric features, enabling regression of the 3D bounding box of a target object. We also introduce the ScanRefer dataset, containing 51,583 descriptions of 11,046 objects from 800 ScanNet scenes. ScanRefer is the first large-scale effort to perform object localization via natural language expression directly in 3D.

Dave Zhenyu Chen, Angel X. Chang, Matthias Nie{\ss}ner• 2019

Related benchmarks

TaskDatasetResultRank
3D Visual GroundingScanRefer (val)
Overall Accuracy @ IoU 0.5043.31
155
3D Question AnsweringScanQA (val)
CIDEr64.9
133
3D Visual GroundingNr3D (test)
Overall Success Rate34.2
88
3D Visual GroundingNr3D
Overall Success Rate34.2
74
3D Question AnsweringScanQA w/ objects (test)
EM@120.56
55
3D Question AnsweringScanQA w/o objects (test)
EM@119.04
51
Visual GroundingScanRefer v1 (val)--
30
3D Visual GroundingScanRefer Unique
Acc@0.25 (IoU=0.25)67.6
24
3D Visual GroundingScanRefer
Acc@0.2537.3
23
3D Question AnsweringScanQA (test)
BLEU-47.5
20
Showing 10 of 26 rows

Other info

Follow for update