Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

About

3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose temporal self-attention to recurrently fuse the history BEV information. Our approach achieves the new state-of-the-art 56.9\% in terms of NDS metric on the nuScenes \texttt{test} set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines. We further show that BEVFormer remarkably improves the accuracy of velocity estimation and recall of objects under low visibility conditions. The code is available at \url{https://github.com/zhiqi-li/BEVFormer}.

Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai• 2022

Related benchmarks

TaskDatasetResultRank
3D Object DetectionnuScenes (val)
NDS51.7
941
3D Object DetectionnuScenes (test)
mAP48.9
829
Semantic segmentationnuScenes (val)--
212
3D Object DetectionNuScenes v1.0 (test)
mAP48.1
210
3D Object DetectionnuScenes v1.0 (val)
mAP (Overall)41.6
190
3D Object DetectionWaymo Open Dataset (val)--
175
3D Occupancy PredictionOcc3D-nuScenes (val)
mIoU2.37e+3
144
Object DetectionnuScenes (val)
mAP41.5
41
Semantic Occupancy PredictionOcc3D (val)
mIoU39.3
37
3D Semantic Occupancy PredictionSurroundOcc (val)
mIoU0.168
36
Showing 10 of 51 rows

Other info

Code

Follow for update