Clustering by Maximizing Mutual Information Across Views

About

We propose a novel framework for image clustering that incorporates joint representation learning and clustering. Our method consists of two heads that share the same backbone network - a "representation learning" head and a "clustering" head. The "representation learning" head captures fine-grained patterns of objects at the instance level which serve as clues for the "clustering" head to extract coarse-grain information that separates objects into clusters. The whole model is trained in an end-to-end manner by minimizing the weighted sum of two sample-oriented contrastive losses applied to the outputs of the two heads. To ensure that the contrastive loss corresponding to the "clustering" head is optimal, we introduce a novel critic function called "log-of-dot-product". Extensive experimental results demonstrate that our method significantly outperforms state-of-the-art single-stage clustering methods across a variety of image datasets, improving over the best baseline by about 5-7% in accuracy on CIFAR10/20, STL10, and ImageNet-Dogs. Further, the "two-stage" variant of our method also achieves better results than baselines on three challenging ImageNet subsets.

Kien Do, Truyen Tran, Svetha Venkatesh• 2021

Related benchmarks

Task	Dataset	Result
Image Clustering	CIFAR-10	NMI0.679	318
Image Clustering	STL-10	ACC81.8	282
Image Clustering	ImageNet-10	NMI0.831	220
Clustering	Imagenet Dogs	NMI48.4	105
Clustering	STL-10	ACC81.8	64
Clustering	CIFAR-10	ACC79.9	52
Image Clustering	CIFAR-20	NMI41.6	43
Image Clustering	Tiny-ImageNet	ACC0.153	37
Clustering	CIFAR100	Clustering Accuracy42.5	31
Clustering	ImageNet dog	Clustering Accuracy46.1	9

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord