Open Ad-hoc Categorization with Contextualized Feature Learning
About
Adaptive categorization of visual scenes is essential for AI agents to handle changing tasks. Unlike fixed common categories for plants or animals, ad-hoc categories are created dynamically to serve specific goals. We study open ad-hoc categorization: Given a few labeled exemplars and abundant unlabeled data, the goal is to discover the underlying context and to expand ad-hoc categories through semantic extension and visual clustering around it. Building on the insight that ad-hoc and common categories rely on similar perceptual mechanisms, we propose OAK, a simple model that introduces a small set of learnable context tokens at the input of a frozen CLIP and optimizes with both CLIP's image-text alignment objective and GCD's visual clustering objective. On Stanford and Clevr-4 datasets, OAK achieves state-of-the-art in accuracy and concept discovery across multiple categorizations, including 87.4% novel accuracy on Stanford Mood, surpassing CLIP and GCD by over 50%. Moreover, OAK produces interpretable saliency maps, focusing on hands for Action, faces for Mood, and backgrounds for Location, promoting transparency and trust while enabling adaptive and generalizable categorization.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Generalized Category Discovery | Stanford Cars | Accuracy (All)65.9 | 128 | |
| Generalized Category Discovery | Clevr-4 (Known) | Texture Acc82.3 | 11 | |
| Generalized Category Discovery | CUB-200 full-shot | Accuracy (Old Categories)59.6 | 6 | |
| Generalized Category Discovery | Stanford Action, Location, Mood (Known) | Action Accuracy88.9 | 6 | |
| Generalized Category Discovery | Stanford Action, Location, Mood (Novel) | Action Acc85.1 | 5 | |
| Generalized Category Discovery | Stanford Action, Location, Mood (Overall) | Action Accuracy86.9 | 5 | |
| Generalized Category Discovery | Clevr 4 (Novel) | Texture Accuracy47.8 | 5 | |
| Generalized Category Discovery | DTD v1 (test) | Old Score56.7 | 4 | |
| Novel Class Discovery | Clevr-4 full-shot | Texture Accuracy66.5 | 4 |