Semantic Flow for Fast and Accurate Scene Parsing
About
In this paper, we focus on designing effective method for fast and accurate scene parsing. A common practice to improve the performance is to attain high resolution feature maps with strong semantic representation. Two strategies are widely used -- atrous convolutions and feature pyramid fusion, are either computation intensive or ineffective. Inspired by the Optical Flow for motion alignment between adjacent video frames, we propose a Flow Alignment Module (FAM) to learn Semantic Flow between feature maps of adjacent levels, and broadcast high-level features to high resolution features effectively and efficiently. Furthermore, integrating our module to a common feature pyramid structure exhibits superior performance over other real-time methods even on light-weight backbone networks, such as ResNet-18. Extensive experiments are conducted on several challenging datasets, including Cityscapes, PASCAL Context, ADE20K and CamVid. Especially, our network is the first to achieve 80.4\% mIoU on Cityscapes with a frame rate of 26 FPS. The code is available at \url{https://github.com/lxtGH/SFSegNets}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | ADE20K (val) | mIoU44.67 | 2731 | |
| Semantic segmentation | Cityscapes (test) | mIoU81.8 | 1145 | |
| Semantic segmentation | CamVid (test) | mIoU73.8 | 411 | |
| Semantic segmentation | PASCAL Context (val) | mIoU45.52 | 323 | |
| Semantic segmentation | BDD100K (test) | mIoU60.6 | 58 | |
| Semantic segmentation | PASCAL-Context 60 classes (test) | mIoU53.8 | 54 | |
| Object Segmentation | iSAID (val) | mIoU64.3 | 42 | |
| Semantic segmentation | Cityscapes (val) | mIoU78.3 | 18 | |
| Scene Parsing | Cityscapes (test) | mIoU80.4 | 17 | |
| Semantic segmentation | Vaihingen (val) | mIoU67.6 | 17 |