Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Self-Supervised Visual Representation Learning from Hierarchical Grouping

About

We create a framework for bootstrapping visual representation learning from a primitive visual grouping capability. We operationalize grouping via a contour detector that partitions an image into regions, followed by merging of those regions into a tree hierarchy. A small supervised dataset suffices for training this grouping primitive. Across a large unlabeled dataset, we apply this learned primitive to automatically predict hierarchical region structure. These predictions serve as guidance for self-supervised contrastive feature learning: we task a deep network with producing per-pixel embeddings whose pairwise distances respect the region hierarchy. Experiments demonstrate that our approach can serve as state-of-the-art generic pre-training, benefiting downstream tasks. We additionally explore applications to semantic region search and video-based object instance tracking.

Xiao Zhang, Michael Maire• 2020

Related benchmarks

TaskDatasetResultRank
Semantic segmentationPASCAL VOC 2012 (test)
mIoU64.7
1415
Semantic segmentationPASCAL VOC (val)
mIoU48.8
362
Semantic segmentationPASCAL (val)
mIoU48.8
25
Semantic Segment RetrievalPASCAL (val)
mIoU (7 classes)24.6
10
Showing 4 of 4 rows

Other info

Follow for update