Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing

About

Compared with image scene parsing, video scene parsing introduces temporal information, which can effectively improve the consistency and accuracy of prediction. In this paper, we propose a Spatial-Temporal Semantic Consistency method to capture class-exclusive context information. Specifically, we design a spatial-temporal consistency loss to constrain the semantic consistency in spatial and temporal dimensions. In addition, we adopt an pseudo-labeling strategy to enrich the training dataset. We obtain the scores of 59.84% and 58.85% mIoU on development (test part 1) and testing set of VSPW, respectively. And our method wins the 1st place on VSPW challenge at ICCV2021.

Xingjian He, Weining Wang, Zhiyong Xu, Hao Wang, Jie Jiang, Jing Liu• 2021

Related benchmarks

TaskDatasetResultRank
Video Semantic SegmentationVSPW (val)
mIoU59.3
92
Video Semantic SegmentationVSPW old codalab (test)
mIoU (%)58.85
5
Showing 2 of 2 rows

Other info

Follow for update