Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation
About
Existing studies in weakly-supervised semantic segmentation (WSSS) using image-level weak supervision have several limitations: sparse object coverage, inaccurate object boundaries, and co-occurring pixels from non-target objects. To overcome these challenges, we propose a novel framework, namely Explicit Pseudo-pixel Supervision (EPS), which learns from pixel-level feedback by combining two weak supervisions; the image-level label provides the object identity via the localization map and the saliency map from the off-the-shelf saliency detection model offers rich boundaries. We devise a joint training strategy to fully utilize the complementary relationship between both information. Our method can obtain accurate object boundaries and discard co-occurring pixels, thereby significantly improving the quality of pseudo-masks. Experimental results show that the proposed method remarkably outperforms existing methods by resolving key challenges of WSSS and achieves the new state-of-the-art performance on both PASCAL VOC 2012 and MS COCO 2014 datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | PASCAL VOC 2012 (val) | Mean IoU71 | 2040 | |
| Semantic segmentation | PASCAL VOC 2012 (test) | mIoU71.8 | 1342 | |
| Semantic segmentation | PASCAL VOC (val) | mIoU70.9 | 338 | |
| Semantic segmentation | COCO 2014 (val) | mIoU35.7 | 251 | |
| Semantic segmentation | Pascal VOC (test) | mIoU70.8 | 236 | |
| Weakly supervised semantic segmentation | PASCAL VOC 2012 (test) | -- | 158 | |
| Weakly supervised semantic segmentation | PASCAL VOC 2012 (val) | -- | 154 | |
| Semantic segmentation | COCO (val) | mIoU35.7 | 135 | |
| Semantic segmentation | PASCAL VOC 2012 (train) | mIoU71.6 | 73 | |
| Semantic segmentation | VOC 2012 (val) | mIoU70.9 | 67 |