Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Cross Modal Transformer: Towards Fast and Robust 3D Object Detection

About

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. It achieves 74.1\% NDS (state-of-the-art with single model) on nuScenes test set while maintaining fast inference speed. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code is released at https://github.com/junjie18/CMT.

Junjie Yan, Yingfei Liu, Jianjian Sun, Fan Jia, Shuailin Li, Tiancai Wang, Xiangyu Zhang• 2023

Related benchmarks

TaskDatasetResultRank
3D Object DetectionnuScenes (val)
NDS72.9
941
3D Object DetectionnuScenes (test)
mAP72
829
3D Object DetectionNuScenes v1.0 (test)
mAP72
210
3D Object DetectionnuScenes v1.0 (val)
mAP (Overall)70.3
190
3D Object DetectionnuScenes v1.0-trainval (val)
NDS72.9
87
3D Object DetectionArgoverse 2 (val)
mAP36.1
62
3D Object DetectionnuScenes-C Fog v1.0 (trainval)
mAP66.3
13
3D Object DetectionnuScenes Rainy (val)
mAP70.5
13
3D Object DetectionnuScenes-C Snow v1.0 (trainval)
mAP62.6
13
3D Object DetectionnuScenes-C Sunlight v1.0 (trainval)
mAP63.6
13
Showing 10 of 13 rows

Other info

Follow for update