Unified Perceptual Parsing for Scene Understanding

About

Humans recognize the visual world at multiple levels: we effortlessly categorize scenes and detect objects inside, while also identifying the textures and surfaces of the objects along with their different compositional parts. In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image. A multi-task framework called UPerNet and a training strategy are developed to learn from heterogeneous image annotations. We benchmark our framework on Unified Perceptual Parsing and show that it is able to effectively segment a wide range of concepts from images. The trained networks are further applied to discover visual knowledge in natural scenes. Models are available at \url{https://github.com/CSAILVision/unifiedparsing}.

Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun• 2018

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K (val)	mIoU57	3069
Semantic segmentation	ADE20K	mIoU56.3	1028
Semantic segmentation	Cityscapes (val)	--	572
Semantic segmentation	Cityscapes (val)	mIoU81.1	527
Semantic segmentation	Pascal Context	--	217
Semantic segmentation	Coco-Stuff (test)	mIoU43.4	216
Semantic segmentation	LoveDA	mIoU52.44	192
Video Semantic Segmentation	VSPW (val)	mIoU37.5	121
Semantic segmentation	MSRS	mIoU65.6	93
Semantic segmentation	LoveDA (test)	mIoU47.6	92

Showing 10 of 58 rows

Other info

Code

Follow for update

@wizwand_team Discord