Seamless Scene Segmentation
About
In this work we introduce a novel, CNN-based architecture that can be trained end-to-end to deliver seamless scene segmentation results. Our goal is to predict consistent semantic segmentation and detection results by means of a panoptic output format, going beyond the simple combination of independently trained segmentation and detection models. The proposed architecture takes advantage of a novel segmentation head that seamlessly integrates multi-scale features generated by a Feature Pyramid Network with contextual information conveyed by a light-weight DeepLab-like module. As additional contribution we review the panoptic metric and propose an alternative that overcomes its limitations when evaluating non-instance categories. Our proposed network architecture yields state-of-the-art results on three challenging street-level datasets, i.e. Cityscapes, Indian Driving Dataset and Mapillary Vistas.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | Cityscapes (val) | mIoU80.7 | 572 | |
| Panoptic Segmentation | Cityscapes (val) | PQ65 | 276 | |
| Instance Segmentation | Cityscapes (val) | AP33.6 | 239 | |
| Panoptic Segmentation | Mapillary Vistas (val) | PQ37.7 | 82 | |
| Semantic segmentation | Mapillary Vistas (val) | mIoU50.4 | 72 | |
| Panoptic Segmentation | Cityscapes (test) | PQ62.6 | 51 | |
| Semantic segmentation | WildDash bench (test) | mIoU Meta Avg (cla)37.9 | 19 | |
| Instance Segmentation | Mapillary Vistas Dataset (val) | AP16.4 | 19 | |
| Semantic segmentation | WildDash 2 (val) | mIoU37.1 | 9 | |
| Hierarchical Semantic Segmentation | Mapillary Vistas 2.0 (val) | mIoU (Level 1)38.17 | 9 |