Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CrOC: Cross-View Online Clustering for Dense Visual Representation Learning

About

Learning dense visual representations without labels is an arduous task and more so from scene-centric data. We propose to tackle this challenging problem by proposing a Cross-view consistency objective with an Online Clustering mechanism (CrOC) to discover and segment the semantics of the views. In the absence of hand-crafted priors, the resulting method is more generalizable and does not require a cumbersome pre-processing step. More importantly, the clustering algorithm conjointly operates on the features of both views, thereby elegantly bypassing the issue of content not represented in both views and the ambiguous matching of objects from one crop to the other. We demonstrate excellent performance on linear and unsupervised segmentation transfer tasks on various datasets and similarly for video object segmentation. Our code and pre-trained models are publicly available at https://github.com/stegmuel/CrOC.

Thomas Stegm\"uller, Tim Lebailly, Behzad Bozorgtabar, Tinne Tuytelaars, Jean-Philippe Thiran• 2023

Related benchmarks

TaskDatasetResultRank
Video Object SegmentationDAVIS 2017 (val)
J mean56.5
1130
Semantic segmentationADE20K
mIoU28.4
936
Semantic segmentationCOCO Stuff (val)
mIoU52.6
126
Semantic segmentationCOCO Object (val)
mIoU0.661
77
Semantic segmentationVOC 2012 (val)
mIoU70.6
67
Video Instance ParsingVIP (val)
mIoU26.1
20
Unsupervised Semantic SegmentationPASCAL VOC 2012 (val)
mIoU20.6
15
Unsupervised SegmentationCOCO Stuff (val)
mIoU21.9
13
Unsupervised SegmentationCOCO-Things (val)
mIoU17.2
13
Showing 9 of 9 rows

Other info

Code

Follow for update