Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?
About
Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors. Recent research has developed a variety of camera-only methods, where features are differentiably "lifted" from the multi-camera images onto the 2D ground plane, yielding a "bird's eye view" (BEV) feature representation of the 3D space around the vehicle. This line of work has produced a variety of novel "lifting" methods, but we observe that other details in the training setups have shifted at the same time, making it unclear what really matters in top-performing methods. We also observe that using cameras alone is not a real-world constraint, considering that additional sensors like radar have been integrated into real vehicles for years already. In this paper, we first of all attempt to elucidate the high-impact factors in the design and training protocol of BEV perception models. We find that batch size and input resolution greatly affect performance, while lifting strategies have a more modest effect -- even a simple parameter-free lifter works well. Second, we demonstrate that radar data can provide a substantial boost to performance, helping to close the gap between camera-only and LiDAR-enabled systems. We analyze the radar usage details that lead to good performance, and invite the community to re-consider this commonly-neglected part of the sensor platform.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| BEV segmentation (Vehicle) | nuScenes v1.0-trainval (val) | Vehicle BEV IoU44.9 | 28 | |
| BEV Semantic Segmentation | nuScenes (val) | Drivable Area IoU77.7 | 28 | |
| BeV Segmentation | nuScenes (val) | Vehicle Segmentation Score55.7 | 16 | |
| Vehicle Segmentation | nuScenes (val) | mIoU60.8 | 14 | |
| BeV vehicle segmentation | nuScenes (val) | IoU (No Filter, 224x480)36.9 | 11 | |
| BeV Segmentation | Dur360BEV (val) | IoU @ 1.031.1 | 8 | |
| Vehicle Segmentation | nuScenes Setting 2: 100m x 100m at 50cm resolution v1.0-trainval (val) | mIoU47.4 | 7 | |
| BeV vehicle segmentation | Lyft L5 FIERY | mIoU (Long Range)44.5 | 7 | |
| BeV vehicle segmentation | nuScenes | Vehicle Segmentation IoU36.9 | 6 | |
| Vehicle Segmentation | nuScenes | Vehicle IoU55.7 | 4 |