BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
About
3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose temporal self-attention to recurrently fuse the history BEV information. Our approach achieves the new state-of-the-art 56.9\% in terms of NDS metric on the nuScenes \texttt{test} set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines. We further show that BEVFormer remarkably improves the accuracy of velocity estimation and recall of objects under low visibility conditions. The code is available at \url{https://github.com/zhiqi-li/BEVFormer}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Detection | nuScenes (val) | NDS51.7 | 941 | |
| 3D Object Detection | nuScenes (test) | mAP48.9 | 829 | |
| Semantic segmentation | nuScenes (val) | -- | 212 | |
| 3D Object Detection | NuScenes v1.0 (test) | mAP48.1 | 210 | |
| 3D Object Detection | nuScenes v1.0 (val) | mAP (Overall)41.6 | 190 | |
| 3D Object Detection | Waymo Open Dataset (val) | -- | 175 | |
| 3D Occupancy Prediction | Occ3D-nuScenes (val) | mIoU2.37e+3 | 144 | |
| Object Detection | nuScenes (val) | mAP41.5 | 41 | |
| Semantic Occupancy Prediction | Occ3D (val) | mIoU39.3 | 37 | |
| 3D Semantic Occupancy Prediction | SurroundOcc (val) | mIoU0.168 | 36 |