Fast-SCNN: Fast Semantic Segmentation Network
About
The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024x2048px) suited to efficient computation on embedded devices with low memory. Building on existing two-branch methods for fast segmentation, we introduce our `learning to downsample' module which computes low-level features for multiple resolution branches simultaneously. Our network combines spatial detail at high resolution with deep features extracted at lower resolution, yielding an accuracy of 68.0% mean intersection over union at 123.5 frames per second on Cityscapes. We also show that large scale pre-training is unnecessary. We thoroughly validate our metric in experiments with ImageNet pre-training and the coarse labeled data of Cityscapes. Finally, we show even faster computation with competitive results on subsampled inputs, without any network modifications.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | Cityscapes (test) | mIoU68 | 1145 | |
| Semantic segmentation | Cityscapes (val) | mIoU69.1 | 572 | |
| Semantic segmentation | Cityscapes (val) | mIoU69.1 | 133 | |
| Semantic segmentation | Cityscapes (val) | mIoU68.6 | 108 | |
| Semantic segmentation | Trans10K v2 (test) | mIoU51.93 | 104 | |
| Semantic segmentation | PST900 (test) | mIoU48.22 | 72 | |
| Semantic segmentation | DensePASS (test) | mIoU24.6 | 51 | |
| Semantic segmentation | Cityscapes fine (test) | mIoU68 | 44 | |
| Semantic segmentation | Stanford2D3D Panoramic 1.0 (Fold-1) | mIoU26.86 | 43 | |
| Semantic segmentation | Cityscapes | Throughput (FPS)485.4 | 42 |