Cross-view Semantic Segmentation for Sensing Surroundings
About
Sensing surroundings plays a crucial role in human spatial perception, as it extracts the spatial configuration of objects as well as the free space from the observations. To facilitate the robot perception with such a surrounding sensing capability, we introduce a novel visual task called Cross-view Semantic Segmentation as well as a framework named View Parsing Network (VPN) to address it. In the cross-view semantic segmentation task, the agent is trained to parse the first-view observations into a top-down-view semantic map indicating the spatial location of all the objects at pixel-level. The main issue of this task is that we lack the real-world annotations of top-down-view data. To mitigate this, we train the VPN in 3D graphics environment and utilize the domain adaptation technique to transfer it to handle real-world data. We evaluate our VPN on both synthetic and real-world agents. The experimental results show that our model can effectively make use of the information from different views and multi-modalities to understanding spatial information. Our further experiment on a LoCoBot robot shows that our model enables the surrounding sensing capability from 2D image input. Code and demo videos can be found at \url{https://view-parsing-network.github.io}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | nuScenes (val) | -- | 212 | |
| LiDAR Semantic Segmentation | nuScenes official (test) | mIoU25.5 | 132 | |
| BEV Semantic Segmentation | nuScenes (val) | Drivable Area IoU58 | 28 | |
| BeV Segmentation | nuScenes v1.0 (val) | Drivable Area65.97 | 25 | |
| BeV Segmentation | nuScenes (val) | Vehicle Segmentation Score28.2 | 16 | |
| Map Segmentation | nuScenes 60m x 30m setting (val) | Divider36.5 | 11 | |
| Map-view Semantic Segmentation | Argoverse (val) | Vehicle IoU23.9 | 9 | |
| Top-view semantic segmentation | Argoverse Road | mIoU71.07 | 8 | |
| Top-view semantic segmentation | Argoverse Vehicle | mIoU16.58 | 8 | |
| Vehicle map-view segmentation | nuScenes | mIoU25.5 | 8 |