Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery

About

Generalized Category Discovery (GCD) aims to identify both known and unknown categories, with only partial labels given for the known categories, posing a challenging open-set recognition problem. State-of-the-art approaches for GCD task are usually built on multi-modality representation learning, which is heavily dependent upon inter-modality alignment. However, few of them cast a proper intra-modality alignment to generate a desired underlying structure of representation distributions. In this paper, we propose a novel and effective multi-modal representation learning framework for GCD via Semi-Supervised Rate Reduction, called SSR$^2$-GCD, to learn cross-modality representations with desired structural properties based on emphasizing to properly align intra-modality relationships. Moreover, to boost knowledge transfer, we integrate prompt candidates by leveraging the inter-modal alignment offered by Vision Language Models. We conduct extensive experiments on generic and fine-grained benchmark datasets demonstrating superior performance of our approach.

Wei He, Xianghan Meng, Zhiyuan Huang, Xianbiao Qi, Rong Xiao, Chun-Guang Li• 2026

Related benchmarks

TaskDatasetResultRank
Generalized Category DiscoveryImageNet-100
All Accuracy92.1
208
Generalized Category DiscoveryCIFAR-100
Accuracy (All)86.4
185
Generalized Category DiscoveryStanford Cars
Accuracy (All)89.2
160
Generalized Category DiscoveryCUB
Accuracy (All)78.3
133
Generalized Category DiscoveryCIFAR-10
All Accuracy98.5
105
Generalized Category DiscoveryOxford Pets
Accuracy (All)95.7
50
Generalized Category DiscoveryImageNet-1K
Accuracy (All)66.7
19
Generalized Category DiscoveryFlowers102
Accuracy (All)93.5
10
Showing 8 of 8 rows

Other info

Follow for update