Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Open Ad-hoc Categorization with Contextualized Feature Learning

About

Adaptive categorization of visual scenes is essential for AI agents to handle changing tasks. Unlike fixed common categories for plants or animals, ad-hoc categories are created dynamically to serve specific goals. We study open ad-hoc categorization: Given a few labeled exemplars and abundant unlabeled data, the goal is to discover the underlying context and to expand ad-hoc categories through semantic extension and visual clustering around it. Building on the insight that ad-hoc and common categories rely on similar perceptual mechanisms, we propose OAK, a simple model that introduces a small set of learnable context tokens at the input of a frozen CLIP and optimizes with both CLIP's image-text alignment objective and GCD's visual clustering objective. On Stanford and Clevr-4 datasets, OAK achieves state-of-the-art in accuracy and concept discovery across multiple categorizations, including 87.4% novel accuracy on Stanford Mood, surpassing CLIP and GCD by over 50%. Moreover, OAK produces interpretable saliency maps, focusing on hands for Action, faces for Mood, and backgrounds for Location, promoting transparency and trust while enabling adaptive and generalizable categorization.

Zilin Wang, Sangwoo Mo, Stella X. Yu, Sima Behpour, Liu Ren• 2025

Related benchmarks

TaskDatasetResultRank
Generalized Category DiscoveryStanford Cars
Accuracy (All)65.9
128
Generalized Category DiscoveryClevr-4 (Known)
Texture Acc82.3
11
Generalized Category DiscoveryCUB-200 full-shot
Accuracy (Old Categories)59.6
6
Generalized Category DiscoveryStanford Action, Location, Mood (Known)
Action Accuracy88.9
6
Generalized Category DiscoveryStanford Action, Location, Mood (Novel)
Action Acc85.1
5
Generalized Category DiscoveryStanford Action, Location, Mood (Overall)
Action Accuracy86.9
5
Generalized Category DiscoveryClevr 4 (Novel)
Texture Accuracy47.8
5
Generalized Category DiscoveryDTD v1 (test)
Old Score56.7
4
Novel Class DiscoveryClevr-4 full-shot
Texture Accuracy66.5
4
Showing 9 of 9 rows

Other info

Follow for update