Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation

About

Pixel-level annotations are expensive and time consuming to obtain. Hence, weak supervision using only image tags could have a significant impact in semantic segmentation. Recently, CNN-based methods have proposed to fine-tune pre-trained networks using image tags. Without additional information, this leads to poor localization accuracy. This problem, however, was alleviated by making use of objectness priors to generate foreground/background masks. Unfortunately these priors either require training pixel-level annotations/bounding boxes, or still yield inaccurate object boundaries. Here, we propose a novel method to extract markedly more accurate masks from the pre-trained network itself, forgoing external objectness modules. This is accomplished using the activations of the higher-level convolutional layers, smoothed by a dense CRF. We demonstrate that our method, based on these masks and a weakly-supervised loss, outperforms the state-of-the-art tag-based weakly-supervised semantic segmentation techniques. Furthermore, we introduce a new form of inexpensive weak supervision yielding an additional accuracy boost.

Fatemehsadat Saleh, Mohammad Sadegh Ali Akbarian, Mathieu Salzmann, Lars Petersson, Stephen Gould, Jose M. Alvarez• 2016

Related benchmarks

Task	Dataset	Result
Semantic segmentation	PASCAL VOC 2012 (val)	Mean IoU51.5	2204
Semantic segmentation	PASCAL VOC 2012 (test)	mIoU48	1477
Semantic segmentation	COCO 2014 (val)	mIoU20.4	304
Semantic segmentation	COCO (val)	mIoU20.4	185
Weakly supervised semantic segmentation	PASCAL VOC 2012 (val)	mIoU46.6	168
Weakly supervised semantic segmentation	PASCAL VOC 2012 (test)	mIoU48	158
Semantic segmentation	COCO Object (val)	mIoU0.204	101
Weakly supervised semantic segmentation	MS-COCO 2014 (val)	mIoU20.4	27
Weakly supervised semantic segmentation	VOC 2012 (val)	mIoU51.5	19

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord