Unsupervised Hierarchical Semantic Segmentation with Multiview Cosegmentation and Clustering Transformers

About

Unsupervised semantic segmentation aims to discover groupings within and across images that capture object and view-invariance of a category without external supervision. Grouping naturally has levels of granularity, creating ambiguity in unsupervised segmentation. Existing methods avoid this ambiguity and treat it as a factor outside modeling, whereas we embrace it and desire hierarchical grouping consistency for unsupervised segmentation. We approach unsupervised segmentation as a pixel-wise feature learning problem. Our idea is that a good representation shall reveal not just a particular level of grouping, but any level of grouping in a consistent and predictable manner. We enforce spatial consistency of grouping and bootstrap feature learning with co-segmentation among multiple views of the same image, and enforce semantic consistency across the grouping hierarchy with clustering transformers between coarse- and fine-grained features. We deliver the first data-driven unsupervised hierarchical semantic segmentation method called Hierarchical Segment Grouping (HSG). Capturing visual similarity and statistical co-occurrences, HSG also outperforms existing unsupervised segmentation methods by a large margin on five major object- and scene-centric benchmarks. Our code is publicly available at https://github.com/twke18/HSG .

Tsung-Wei Ke, Jyh-Jing Hwang, Yunhui Guo, Xudong Wang, Stella X. Yu• 2022

Related benchmarks

Task	Dataset	Result
Semantic segmentation	PASCAL VOC 2012 (test)	mIoU41.9	1477
Semantic segmentation	PASCAL VOC (val)	mIoU41.9	380
Semantic segmentation	Cityscapes-C (val)	mIoU32.5	56
Unsupervised image segmentation	Coco-Stuff (test)	Accuracy57.6	26
Unsupervised image segmentation	Potsdam (test)	Accuracy67.4	15
Semantic segmentation	Cityscapes (val)	mIoU32.5	5
Semantic segmentation	KITTI-STEP (val)	mIoU21.7	5

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord