Sparse4D v2: Recurrent Temporal Fusion with Sparse Model

About

Sparse algorithms offer great flexibility for multi-view temporal perception tasks. In this paper, we present an enhanced version of Sparse4D, in which we improve the temporal fusion module by implementing a recursive form of multi-frame feature sampling. By effectively decoupling image features and structured anchor features, Sparse4D enables a highly efficient transformation of temporal features, thereby facilitating temporal fusion solely through the frame-by-frame transmission of sparse features. The recurrent temporal fusion approach provides two main benefits. Firstly, it reduces the computational complexity of temporal fusion from $O(T)$ to $O(1)$, resulting in significant improvements in inference speed and memory usage. Secondly, it enables the fusion of long-term information, leading to more pronounced performance improvements due to temporal fusion. Our proposed approach, Sparse4Dv2, further enhances the performance of the sparse perception algorithm and achieves state-of-the-art results on the nuScenes 3D detection benchmark. Code will be available at \url{https://github.com/linxuewu/Sparse4D}.

Xuewu Lin, Tianwei Lin, Zixiang Pei, Lichao Huang, Zhizhong Su• 2023

Related benchmarks

Task	Dataset	Result
3D Object Detection	nuScenes (val)	NDS53.9	981
3D Object Detection	NuScenes v1.0 (test)	mAP55.7	230
3D Object Detection	nuScenes v1.0 (val)	mAP (Overall)53.8	207
3D Object Detection	nuScenes v1.0-trainval (val)	NDS59.4	182
3D Object Detection	Argoverse 2 (val)	mAP18.9	101
3D Object Detection	nuScenes (val)	mAP50.5	19
3D Object Detection	nuScenes v1.1 (val)	NDS53.9	14
Image Classification	ImageNet	Top-1 Accuracy66.1	9

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord