Street-View Image Generation from a Bird's-Eye View Layout
About
Bird's-Eye View (BEV) Perception has received increasing attention in recent years as it provides a concise and unified spatial representation across views and benefits a diverse set of downstream driving applications. At the same time, data-driven simulation for autonomous driving has been a focal point of recent research but with few approaches that are both fully data-driven and controllable. Instead of using perception data from real-life scenarios, an ideal model for simulation would generate realistic street-view images that align with a given HD map and traffic layout, a task that is critical for visualizing complex traffic scenarios and developing robust perception models for autonomous driving. In this paper, we propose BEVGen, a conditional generative model that synthesizes a set of realistic and spatially consistent surrounding images that match the BEV layout of a traffic scenario. BEVGen incorporates a novel cross-view transformation with spatial attention design which learns the relationship between cameras and map views to ensure their consistency. We evaluate the proposed model on the challenging NuScenes and Argoverse 2 datasets. After training, BEVGen can accurately render road and lane lines, as well as generate traffic scenes with diverse different weather conditions and times of day.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | nuScenes (val) | -- | 41 | |
| Map Segmentation | nuScenes (val) | -- | 23 | |
| Video Prediction | nuScenes (val) | FID25.54 | 16 | |
| Camera Generation | nuScenes v1.0-trainval (val) | FID25.54 | 11 | |
| Driving Scene Generation | nuScenes (val) | FID25.54 | 9 | |
| Controllable Image Generation | nuScenes (val) | FID25.5 | 7 | |
| Controllable Multi-view Video Generation | nuScenes (val) | mIoU (Foreground)5.89 | 7 | |
| Object Detection | nuScenes v1.0-trainval (val) | AP (Car)47.3 | 5 | |
| Map Segmentation | nuScenes v1.0-trainval (val) | mIoU Road71.9 | 5 | |
| Place Recognition | nuScenes (val) | AR@131.2 | 4 |