MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and Camera Fusion

About

Multi-view radar-camera fused 3D object detection provides a farther detection range and more helpful features for autonomous driving, especially under adverse weather. The current radar-camera fusion methods deliver kinds of designs to fuse radar information with camera data. However, these fusion approaches usually adopt the straightforward concatenation operation between multi-modal features, which ignores the semantic alignment with radar features and sufficient correlations across modals. In this paper, we present MVFusion, a novel Multi-View radar-camera Fusion method to achieve semantic-aligned radar features and enhance the cross-modal information interaction. To achieve so, we inject the semantic alignment into the radar features via the semantic-aligned radar encoder (SARE) to produce image-guided radar features. Then, we propose the radar-guided fusion transformer (RGFT) to fuse our radar and image features to strengthen the two modals' correlation from the global scope via the cross-attention mechanism. Extensive experiments show that MVFusion achieves state-of-the-art performance (51.7% NDS and 45.3% mAP) on the nuScenes dataset. We shall release our code and trained networks upon publication.

Zizhang Wu, Guilian Chen, Yuanzhu Gan, Lei Wang, Jian Pu• 2023

Related benchmarks

Task	Dataset	Result
3D Object Detection	nuScenes (val)	NDS45.5	941
3D Object Detection	nuScenes (test)	mAP45.3	829
3D Object Detection	NuScenes v1.0 (test)	mAP45.3	210
3D Object Detection	nuScenes v1.0 (val)	mAP (Overall)42.1	190
3D Object Detection	nuScenes v1.0-trainval (val)	NDS45.5	87

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord