CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection

About

4D radar has received significant attention in autonomous driving thanks to its robustness under adverse weathers. Due to the sparse points and noisy measurements of the 4D radar, most of the research finish the 3D object detection task by integrating images from camera and perform modality fusion in BEV space. However, the potential of the radar and the fusion mechanism is still largely unexplored, hindering the performance improvement. In this study, we propose a cross-view two-stage fusion network called CVFusion. In the first stage, we design a radar guided iterative (RGIter) BEV fusion module to generate high-recall 3D proposal boxes. In the second stage, we aggregate features from multiple heterogeneous views including points, image, and BEV for each proposal. These comprehensive instance level features greatly help refine the proposals and generate high-quality predictions. Extensive experiments on public datasets show that our method outperforms the previous state-of-the-art methods by a large margin, with 9.10% and 3.68% mAP improvements on View-of-Delft (VoD) and TJ4DRadSet, respectively. Our code will be made publicly available.

Hanzhi Zhong, Zhiyu Xiang, Ruoyu Xu, Jingyun Fu, Peng Xu, Shaohong Wang, Zhihao Yang, Tianyu Pu, Eryun Liu• 2025

Related benchmarks

Task	Dataset	Result
3D Object Detection	View-of-Delft (VoD) Entire Annotated Area (val)	mAP3D65.41	115
3D Object Detection	View-of-Delft (VoD) In Driving Corridor (val)	AP3D (Car)89.86	81
3D Object Detection	TJ4DRadSet (test)	mAP3D40	71
3D Object Detection	View-of-Delft (VoD) (val)	AP (Car, Entire Area)60.9	36
BEV Object Detection	TJ4DRadSet (test)	BEV mAP44.07	32

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord