Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Cross Modal Transformer: Towards Fast and Robust 3D Object Detection

About

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. It achieves 74.1\% NDS (state-of-the-art with single model) on nuScenes test set while maintaining fast inference speed. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code is released at https://github.com/junjie18/CMT.

Junjie Yan, Yingfei Liu, Jianjian Sun, Fan Jia, Shuailin Li, Tiancai Wang, Xiangyu Zhang• 2023

Related benchmarks

TaskDatasetResultRank
3D Object DetectionnuScenes (val)
NDS72.9
981
3D Object DetectionnuScenes (test)
mAP72
903
3D Object DetectionNuScenes v1.0 (test)
mAP72
230
3D Object DetectionnuScenes (val)
NDS46
217
3D Object DetectionnuScenes v1.0 (val)
mAP (Overall)70.3
207
3D Object DetectionnuScenes v1.0-trainval (val)
NDS72.9
182
3D Object DetectionArgoverse 2 (val)
mAP36.1
101
3D Object DetectionnuScenes LiDAR Beamsreduce
NDS60.1
41
3D Object DetectionnuScenes Night (val)
mAP42.8
26
3D Object DetectionnuScenes LiDAR Motionblur
NDS63.93
24
Showing 10 of 39 rows

Other info

Follow for update