Pixel-level Encoding and Depth Layering for Instance-level Semantic Labeling
About
Recent approaches for instance-aware semantic labeling have augmented convolutional neural networks (CNNs) with complex multi-task architectures or computationally expensive graphical models. We present a method that leverages a fully convolutional network (FCN) to predict semantic labels, depth and an instance-based encoding using each pixel's direction towards its corresponding instance center. Subsequently, we apply low-level computer vision techniques to generate state-of-the-art instance segmentation on the street scene datasets KITTI and Cityscapes. Our approach outperforms existing works by a large margin and can additionally predict absolute distances of individual instances from a monocular image as well as a pixel-level semantic labeling.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | Cityscapes (test) | mIoU64.3 | 1145 | |
| Panoptic Segmentation | Cityscapes (val) | -- | 276 | |
| Instance Segmentation | Cityscapes (val) | AP9.9 | 239 | |
| Instance Segmentation | Cityscapes (test) | AP (Overall)8.9 | 122 | |
| Panoptic Segmentation | Cityscapes (test) | -- | 51 |