The Cityscapes Dataset for Semantic Urban Scene Understanding
About
Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations; 20000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | Cityscapes (test) | mIoU64.23 | 1145 | |
| Instance Segmentation | COCO 2017 (val) | -- | 1144 | |
| Semantic segmentation | Cityscapes (val) | mIoU55.07 | 572 | |
| Semantic segmentation | GTA5 → Cityscapes (val) | mIoU48.6 | 533 | |
| Semantic segmentation | CamVid (test) | mIoU48.52 | 411 | |
| Instance Segmentation | Cityscapes (val) | -- | 239 | |
| Semantic segmentation | SUN RGB-D (test) | mIoU15.47 | 191 | |
| Instance Segmentation | Cityscapes (test) | AP (Overall)4.6 | 122 | |
| Semantic segmentation | VOC 2012 (val) | mIoU29.2 | 67 | |
| Semantic segmentation | Cityscapes trained on SYNTHIA (val) | Road IoU86.2 | 60 |