Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
About
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. It achieves 74.1\% NDS (state-of-the-art with single model) on nuScenes test set while maintaining fast inference speed. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code is released at https://github.com/junjie18/CMT.
Junjie Yan, Yingfei Liu, Jianjian Sun, Fan Jia, Shuailin Li, Tiancai Wang, Xiangyu Zhang• 2023
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Detection | nuScenes (val) | NDS72.9 | 941 | |
| 3D Object Detection | nuScenes (test) | mAP72 | 829 | |
| 3D Object Detection | NuScenes v1.0 (test) | mAP72 | 210 | |
| 3D Object Detection | nuScenes v1.0 (val) | mAP (Overall)70.3 | 190 | |
| 3D Object Detection | nuScenes v1.0-trainval (val) | NDS72.9 | 87 | |
| 3D Object Detection | Argoverse 2 (val) | mAP36.1 | 62 | |
| 3D Object Detection | nuScenes-C Fog v1.0 (trainval) | mAP66.3 | 13 | |
| 3D Object Detection | nuScenes Rainy (val) | mAP70.5 | 13 | |
| 3D Object Detection | nuScenes-C Snow v1.0 (trainval) | mAP62.6 | 13 | |
| 3D Object Detection | nuScenes-C Sunlight v1.0 (trainval) | mAP63.6 | 13 |
Showing 10 of 13 rows