Unsupervised Visual Representation Learning by Synchronous Momentum Grouping

About

In this paper, we propose a genuine group-level contrastive visual representation learning method whose linear evaluation performance on ImageNet surpasses the vanilla supervised learning. Two mainstream unsupervised learning schemes are the instance-level contrastive framework and clustering-based schemes. The former adopts the extremely fine-grained instance-level discrimination whose supervisory signal is not efficient due to the false negatives. Though the latter solves this, they commonly come with some restrictions affecting the performance. To integrate their advantages, we design the SMoG method. SMoG follows the framework of contrastive learning but replaces the contrastive unit from instance to group, mimicking clustering-based methods. To achieve this, we propose the momentum grouping scheme which synchronously conducts feature grouping with representation learning. In this way, SMoG solves the problem of supervisory signal hysteresis which the clustering-based method usually faces, and reduces the false negatives of instance contrastive methods. We conduct exhaustive experiments to show that SMoG works well on both CNN and Transformer backbones. Results prove that SMoG has surpassed the current SOTA unsupervised representation learning methods. Moreover, its linear evaluation results surpass the performances obtained by vanilla supervised learning and the representation can be well transferred to downstream tasks.

Bo Pang, Yifan Zhang, Yaoyi Li, Jia Cai, Cewu Lu• 2022

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet (val)	Top-1 Acc77.7	1206
Semantic segmentation	Cityscapes	mIoU76.03	674
Semantic segmentation	VOC 2012	mIoU76.22	71
Object Detection	COCO standard 2017 (train val)	AP (IoU 0.5:0.95)40.1	64
Image Classification	ImageNet 1.0 (10% labeled)	--	33
Image Classification	ImageNet-1K 1.0 (1% labels)	Top-1 Acc63.6	28
Instance Segmentation	COCO 2017 (train/val)	AP (Mask)36.9	21
Image Classification	ImageNet 100% labels 1.0	Top-1 Accuracy80.2	7

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord