Self-Supervised Visual Representation Learning from Hierarchical Grouping

About

We create a framework for bootstrapping visual representation learning from a primitive visual grouping capability. We operationalize grouping via a contour detector that partitions an image into regions, followed by merging of those regions into a tree hierarchy. A small supervised dataset suffices for training this grouping primitive. Across a large unlabeled dataset, we apply this learned primitive to automatically predict hierarchical region structure. These predictions serve as guidance for self-supervised contrastive feature learning: we task a deep network with producing per-pixel embeddings whose pairwise distances respect the region hierarchy. Experiments demonstrate that our approach can serve as state-of-the-art generic pre-training, benefiting downstream tasks. We additionally explore applications to semantic region search and video-based object instance tracking.

Xiao Zhang, Michael Maire• 2020

Related benchmarks

Task	Dataset	Result
Semantic segmentation	PASCAL VOC 2012 (test)	mIoU64.7	1477
Semantic segmentation	PASCAL VOC (val)	mIoU48.8	380
Semantic segmentation	PASCAL (val)	mIoU48.8	25
Semantic Segment Retrieval	PASCAL (val)	mIoU (7 classes)24.6	10

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord