Joint Representation Learning and Novel Category Discovery on Single- and Multi-modal Data
About
This paper studies the problem of novel category discovery on single- and multi-modal data with labels from different but relevant categories. We present a generic, end-to-end framework to jointly learn a reliable representation and assign clusters to unlabelled data. To avoid over-fitting the learnt embedding to labelled data, we take inspiration from self-supervised representation learning by noise-contrastive estimation and extend it to jointly handle labelled and unlabelled data. In particular, we propose using category discrimination on labelled data and cross-modal discrimination on multi-modal data to augment instance discrimination used in conventional contrastive learning approaches. We further employ Winner-Take-All (WTA) hashing algorithm on the shared representation space to generate pairwise pseudo labels for unlabelled data to better predict cluster assignments. We thoroughly evaluate our framework on large-scale multi-modal video benchmarks Kinetics-400 and VGG-Sound, and image benchmarks CIFAR10, CIFAR100 and ImageNet, obtaining state-of-the-art results.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Generalized Category Discovery | CUB-200 (test) | Overall Accuracy26.5 | 63 | |
| Fine-grained object category discovery | Stanford Cars (test) | -- | 38 | |
| Clustering | CIFAR10 unlabelled (train) | Clustering Accuracy93.4 | 14 | |
| Clustering | ImageNet unlabelled (train) | Clustering Accuracy86.7 | 14 | |
| Clustering | CIFAR100-20 unlabelled (train) | Clustering Accuracy76.4 | 13 | |
| Continuous Novel Category Discovery | CIFAR-100 DI scenario | Mf0.3 | 5 | |
| On-the-fly Category Discovery | Arachnida (test) | Accuracy (All)28.1 | 5 | |
| On-the-fly Category Discovery | Animalia (test) | Accuracy (All)33.4 | 5 | |
| On-the-fly Category Discovery | Oxford Pets (test) | Accuracy (All)35.2 | 5 | |
| On-the-fly Category Discovery | Fungi (test) | Accuracy (All)27.5 | 5 |