Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Semantic Video CNNs through Representation Warping

About

In this work, we propose a technique to convert CNN models for semantic segmentation of static images into CNNs for video data. We describe a warping method that can be used to augment existing architectures with very little extra computational cost. This module is called NetWarp and we demonstrate its use for a range of network architectures. The main design principle is to use optical flow of adjacent frames for warping internal network representations across time. A key insight of this work is that fast optical flow methods can be combined with many different CNN architectures for improved performance and end-to-end training. Experiments validate that the proposed approach incurs only little extra computational cost, while improving performance, when video streams are available. We achieve new state-of-the-art results on the CamVid and Cityscapes benchmark datasets and show consistent improvements over different baseline networks. Our code and models will be available at http://segmentation.is.tue.mpg.de

Raghudeep Gadde, Varun Jampani, Peter V. Gehler• 2017

Related benchmarks

TaskDatasetResultRank
Semantic segmentationCityscapes (test)
mIoU80.5
1145
Semantic segmentationCamVid (test)
mIoU67.1
411
Video Semantic SegmentationCityscapes (val)
mIoU80.6
91
Video Semantic SegmentationVSPW (test)
mIoU37.5
25
Video Semantic SegmentationCamVid
mIoU67.1
14
Semantic segmentationRuralScapes 12 semantic classes (val)
mIoU63.99
12
Semantic segmentationUAVid 8 semantic classes (val)
mIoU79.31
12
Video Semantic SegmentationCamVid (val)
mIoU67.1
4
Showing 8 of 8 rows

Other info

Follow for update