Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Temporally Distributed Networks for Fast Video Semantic Segmentation

About

We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation. We observe that features extracted from a certain high-level layer of a deep CNN can be approximated by composing features extracted from several shallower sub-networks. Leveraging the inherent temporal continuity in videos, we distribute these sub-networks over sequential frames. Therefore, at each time step, we only need to perform a lightweight computation to extract a sub-features group from a single sub-network. The full features used for segmentation are then recomposed by application of a novel attention propagation module that compensates for geometry deformation between frames. A grouped knowledge distillation loss is also introduced to further improve the representation power at both full and sub-feature levels. Experiments on Cityscapes, CamVid, and NYUD-v2 demonstrate that our method achieves state-of-the-art accuracy with significantly faster speed and lower latency.

Ping Hu, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Stan Sclaroff, Federico Perazzi• 2020

Related benchmarks

TaskDatasetResultRank
Semantic segmentationCityscapes (test)
mIoU74.9
1145
Semantic segmentationCityscapes (val)
mIoU79.9
572
Semantic segmentationCamVid (test)
mIoU76
411
Semantic segmentationCityscapes (val)
mIoU75
332
Semantic segmentationNYU Depth V2 (test)
mIoU43.5
172
Video Semantic SegmentationCityscapes (val)
mIoU79.9
91
Semantic segmentationUAVid (test)
mIoU52
37
Semantic segmentationNYU Depth V2--
26
Semantic Video SegmentationCityscapes (test)
mIoU79.4
24
Surgical Instrument SegmentationEndoVis 2017 (test)
mIoU49.24
22
Showing 10 of 14 rows

Other info

Code

Follow for update