Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CLIP-GCD: Simple Language Guided Generalized Category Discovery

About

Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data. Prior methods leveraged self-supervised pre-training combined with supervised fine-tuning on the labeled data, followed by simple clustering methods. In this paper, we posit that such methods are still prone to poor performance on out-of-distribution categories, and do not leverage a key ingredient: Semantic relationships between object categories. We therefore propose to leverage multi-modal (vision and language) models, in two complementary ways. First, we establish a strong baseline by replacing uni-modal features with CLIP, inspired by its zero-shot performance. Second, we propose a novel retrieval-based mechanism that leverages CLIP's aligned vision-language representations by mining text descriptions from a text corpus for the labeled and unlabeled set. We specifically use the alignment between CLIP's visual encoding of the image and textual encoding of the corpus to retrieve top-k relevant pieces of text and incorporate their embeddings to perform joint image+text semi-supervised clustering. We perform rigorous experimentation and ablations (including on where to retrieve from, how much to retrieve, and how to combine information), and validate our results on several datasets including out-of-distribution domains, demonstrating state-of-art results.

Rabah Ouldnoughi, Chia-Wen Kuo, Zsolt Kira• 2023

Related benchmarks

TaskDatasetResultRank
Generalized Category DiscoveryImageNet-100
All Accuracy84
138
Generalized Category DiscoveryCIFAR-100
Accuracy (All)85.2
133
Generalized Category DiscoveryStanford Cars
Accuracy (All)70.6
128
Generalized Category DiscoveryCUB
Accuracy (All)62.8
113
Generalized Category DiscoveryCIFAR-10
All Accuracy96.6
105
Generalized Category DiscoveryFGVC Aircraft
Accuracy (All)50
82
Generalized Category DiscoveryOxford Pets
Accuracy (All)70.6
11
Generalized Category DiscoveryFlowers102
Accuracy (All)76.3
10
Showing 8 of 8 rows

Other info

Follow for update