Fully Sparse Fusion for 3D Object Detection

About

Currently prevalent multimodal 3D detection methods are built upon LiDAR-based detectors that usually use dense Bird's-Eye-View (BEV) feature maps. However, the cost of such BEV feature maps is quadratic to the detection range, making it not suitable for long-range detection. Fully sparse architecture is gaining attention as they are highly efficient in long-range perception. In this paper, we study how to effectively leverage image modality in the emerging fully sparse architecture. Particularly, utilizing instance queries, our framework integrates the well-studied 2D instance segmentation into the LiDAR side, which is parallel to the 3D instance segmentation part in the fully sparse detector. This design achieves a uniform query-based fusion framework in both the 2D and 3D sides while maintaining the fully sparse characteristic. Extensive experiments showcase state-of-the-art results on the widely used nuScenes dataset and the long-range Argoverse 2 dataset. Notably, the inference speed of the proposed method under the long-range LiDAR perception setting is 2.7 $\times$ faster than that of other state-of-the-art multimodal 3D detection methods. Code will be released at \url{https://github.com/BraveGroup/FullySparseFusion}.

Yingyan Li, Lue Fan, Yang Liu, Zehao Huang, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang• 2023

Related benchmarks

Task	Dataset	Result
3D Object Detection	Argoverse 2 (val)	mAP33.2	101
3D Object Detection	nuScenes Rainy (val)	mAP23.4	22
3D Object Detection	nuScenes	mAP (All)70.4	19
3D Object Detection	nuScenes Oracle (All)	mAP64.7	15
3D Object Detection	nuScenes Rain Oracle	mAP61.1	15
3D Object Detection	nuScenes Oracle (Night)	mAP (3D)37.1	15
3D Object Detection	nuScenes Source	mAP59.6	9
3D Object Detection	nuScenes night	mAP36.6	9
3D Object Detection	nuScenes Boston	mAP28.2	9
3D Object Detection	nuScenes Average of Target Domains	mAP29.4	9

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord