The Cityscapes Dataset for Semantic Urban Scene Understanding

About

Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations; 20000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele• 2016

Related benchmarks

Task	Dataset	Result
Instance Segmentation	COCO 2017 (val)	--	1304
Semantic segmentation	Cityscapes (test)	mIoU64.23	1254
Semantic segmentation	GTA5 → Cityscapes (val)	mIoU48.6	586
Semantic segmentation	Cityscapes (val)	mIoU55.07	572
Semantic segmentation	CamVid (test)	mIoU48.52	411
Instance Segmentation	Cityscapes (val)	--	247
Semantic segmentation	SUN RGB-D (test)	mIoU15.47	212
Instance Segmentation	Cityscapes (test)	AP (Overall)4.6	122
Semantic segmentation	VOC 2012 (val)	mIoU29.2	76
Instance Segmentation	ScanNet (val)	--	62

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord