An End-to-End Transformer Model for 3D Object Detection
About
We propose 3DETR, an end-to-end Transformer based object detection model for 3D point clouds. Compared to existing detection methods that employ a number of 3D-specific inductive biases, 3DETR requires minimal modifications to the vanilla Transformer block. Specifically, we find that a standard Transformer with non-parametric queries and Fourier positional embeddings is competitive with specialized architectures that employ libraries of 3D-specific operators with hand-tuned hyperparameters. Nevertheless, 3DETR is conceptually simple and easy to implement, enabling further improvements by incorporating 3D domain knowledge. Through extensive experiments, we show 3DETR outperforms the well-established and highly optimized VoteNet baselines on the challenging ScanNetV2 dataset by 9.5%. Furthermore, we show 3DETR is applicable to 3D tasks beyond detection, and can serve as a building block for future research.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Detection | ScanNet V2 (val) | mAP@0.2565 | 352 | |
| Shape classification | ModelNet40 (test) | OA92.1 | 255 | |
| 3D Object Detection | SUN RGB-D (val) | mAP@0.2559.1 | 158 | |
| 3D Object Detection | ScanNet | mAP@0.2565 | 123 | |
| 3D Object Detection | SUN RGB-D | mAP@0.2559.1 | 104 | |
| 3D Object Detection | SUN RGB-D v1 (val) | mAP@0.2559.1 | 81 | |
| Object Detection | ScanNet v2 (test) | AP@0.5037.9 | 70 | |
| 3D Object Detection | ScanNet (val) | mAP@0.2565 | 66 | |
| 3D Object Detection | SUN RGB-D (test) | mAP@0.2559.1 | 64 | |
| 3D Object Detection | ScanNet V2 | AP5047 | 54 |