Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

An End-to-End Transformer Model for 3D Object Detection

About

We propose 3DETR, an end-to-end Transformer based object detection model for 3D point clouds. Compared to existing detection methods that employ a number of 3D-specific inductive biases, 3DETR requires minimal modifications to the vanilla Transformer block. Specifically, we find that a standard Transformer with non-parametric queries and Fourier positional embeddings is competitive with specialized architectures that employ libraries of 3D-specific operators with hand-tuned hyperparameters. Nevertheless, 3DETR is conceptually simple and easy to implement, enabling further improvements by incorporating 3D domain knowledge. Through extensive experiments, we show 3DETR outperforms the well-established and highly optimized VoteNet baselines on the challenging ScanNetV2 dataset by 9.5%. Furthermore, we show 3DETR is applicable to 3D tasks beyond detection, and can serve as a building block for future research.

Ishan Misra, Rohit Girdhar, Armand Joulin• 2021

Related benchmarks

TaskDatasetResultRank
3D Object DetectionScanNet V2 (val)
mAP@0.2565
352
Shape classificationModelNet40 (test)
OA92.1
255
3D Object DetectionSUN RGB-D (val)
mAP@0.2559.1
158
3D Object DetectionScanNet
mAP@0.2565
123
3D Object DetectionSUN RGB-D
mAP@0.2559.1
104
3D Object DetectionSUN RGB-D v1 (val)
mAP@0.2559.1
81
Object DetectionScanNet v2 (test)
AP@0.5037.9
70
3D Object DetectionScanNet (val)
mAP@0.2565
66
3D Object DetectionSUN RGB-D (test)
mAP@0.2559.1
64
3D Object DetectionScanNet V2
AP5047
54
Showing 10 of 18 rows

Other info

Code

Follow for update