Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

End-to-End 3D Dense Captioning with Vote2Cap-DETR

About

3D dense captioning aims to generate multiple captions localized with their associated object regions. Existing methods follow a sophisticated ``detect-then-describe'' pipeline equipped with numerous hand-crafted components. However, these hand-crafted components would yield suboptimal performance given cluttered object spatial and class distributions among different scenes. In this paper, we propose a simple-yet-effective transformer framework Vote2Cap-DETR based on recent popular \textbf{DE}tection \textbf{TR}ansformer (DETR). Compared with prior arts, our framework has several appealing advantages: 1) Without resorting to numerous hand-crafted components, our method is based on a full transformer encoder-decoder architecture with a learnable vote query driven object decoder, and a caption decoder that produces the dense captions in a set-prediction manner. 2) In contrast to the two-stage scheme, our method can perform detection and captioning in one-stage. 3) Without bells and whistles, extensive experiments on two commonly used datasets, ScanRefer and Nr3D, demonstrate that our Vote2Cap-DETR surpasses current state-of-the-arts by 11.13\% and 7.11\% in CIDEr@0.5IoU, respectively. Codes will be released soon.

Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang YU• 2023

Related benchmarks

TaskDatasetResultRank
3D Dense CaptioningScan2Cap
CIDEr @0.561.8
96
3D Dense CaptioningScanRefer (val)
CIDEr72.79
91
3D Dense CaptioningScan2Cap (val)
B-40.345
43
3D Dense CaptioningScanRefer (test)
CIDEr86.28
30
3D Dense CaptioningNr3D 1 (val)
CIDEr (IoU=0.5)43.84
22
3D Dense CaptioningReferIt3D Nr3D (test)
C Score (0.5 IoU)45.53
13
3D Dense CaptioningNr3D (test)
C Score @ 0.5 IoU45.53
13
3D Dense CaptioningNr3D 1 (test)
CIDEr43.84
7
3D Dense CaptioningTOD3Cap Zero-shot OOD (test)
C @ IoU 0.2549.8
6
3D Dense CaptioningTOD3Cap In-domain (test)
C (IoU=0.25)72.8
4
Showing 10 of 10 rows

Other info

Code

Follow for update