Sparse4D v2: Recurrent Temporal Fusion with Sparse Model
About
Sparse algorithms offer great flexibility for multi-view temporal perception tasks. In this paper, we present an enhanced version of Sparse4D, in which we improve the temporal fusion module by implementing a recursive form of multi-frame feature sampling. By effectively decoupling image features and structured anchor features, Sparse4D enables a highly efficient transformation of temporal features, thereby facilitating temporal fusion solely through the frame-by-frame transmission of sparse features. The recurrent temporal fusion approach provides two main benefits. Firstly, it reduces the computational complexity of temporal fusion from $O(T)$ to $O(1)$, resulting in significant improvements in inference speed and memory usage. Secondly, it enables the fusion of long-term information, leading to more pronounced performance improvements due to temporal fusion. Our proposed approach, Sparse4Dv2, further enhances the performance of the sparse perception algorithm and achieves state-of-the-art results on the nuScenes 3D detection benchmark. Code will be available at \url{https://github.com/linxuewu/Sparse4D}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Detection | nuScenes (val) | NDS53.9 | 941 | |
| 3D Object Detection | NuScenes v1.0 (test) | mAP55.6 | 210 | |
| 3D Object Detection | nuScenes v1.0 (val) | mAP (Overall)50.5 | 190 | |
| 3D Object Detection | nuScenes v1.0-trainval (val) | NDS53.9 | 87 | |
| 3D Object Detection | Argoverse 2 (val) | mAP18.9 | 62 | |
| 3D Object Detection | nuScenes v1.1 (val) | NDS53.9 | 14 | |
| Image Classification | ImageNet | Top-1 Accuracy66.1 | 9 |