Automatically Discovering and Learning New Visual Categories with Ranking Statistics

About

We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes. This setting is similar to semi-supervised learning, but significantly harder because there are no labelled examples for the new classes. The challenge, then, is to leverage the information contained in the labelled images in order to learn a general-purpose clustering model and use the latter to identify the new classes in the unlabelled data. In this work we address this problem by combining three ideas: (1) we suggest that the common approach of bootstrapping an image representation using the labeled data only introduces an unwanted bias, and that this can be avoided by using self-supervised learning to train the representation from scratch on the union of labelled and unlabelled data; (2) we use rank statistics to transfer the model's knowledge of the labelled classes to the problem of clustering the unlabelled images; and, (3) we train the data representation by optimizing a joint objective function on the labelled and unlabelled subsets of the data, improving both the supervised classification of the labelled data, and the clustering of the unlabelled data. We evaluate our approach on standard classification benchmarks and outperform current methods for novel category discovery by a significant margin.

Kai Han, Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Andrea Vedaldi, Andrew Zisserman• 2020

Related benchmarks

Task	Dataset	Result
Image Classification	FGVC-Aircraft (test)	Accuracy11.1	322
Generalized Category Discovery	ImageNet-100	All Accuracy37.1	236
Generalized Category Discovery	CIFAR-100	Accuracy (All)58.2	233
Generalized Category Discovery	CIFAR-10	All Accuracy46.8	152
Generalized Category Discovery	CUB-200 (test)	Overall Accuracy33.3	81
Image Classification	Oxford-IIIT Pet (test)	Overall Accuracy11.1	59
Generalized Category Discovery	Herbarium19 (test)	Score (All Categories)27.9	52
Fine-grained Image Classification	FGVC Aircraft	Accuracy (All)26.9	50
Open-world semi-supervised learning	CIFAR-100 (test)	Overall Accuracy23.1	40
Fine-grained Image Classification	CUB-200	Accuracy (All)33.3	39

Showing 10 of 60 rows

Other info

Follow for update

@wizwand_team Discord