Democratizing Fine-grained Visual Recognition with Large Language Models
About
Identifying subordinate-level categories from images is a longstanding task in computer vision and is referred to as fine-grained visual recognition (FGVR). It has tremendous significance in real-world applications since an average layperson does not excel at differentiating species of birds or mushrooms due to subtle differences among the species. A major bottleneck in developing FGVR systems is caused by the need of high-quality paired expert annotations. To circumvent the need of expert knowledge we propose Fine-grained Semantic Category Reasoning (FineR) that internally leverages the world knowledge of large language models (LLMs) as a proxy in order to reason about fine-grained category names. In detail, to bridge the modality gap between images and LLM, we extract part-level visual attributes from images as text and feed that information to a LLM. Based on the visual attributes and its internal world knowledge the LLM reasons about the subordinate-level category names. Our training-free FineR outperforms several state-of-the-art FGVR and language and vision assistant models and shows promise in working in the wild and in new domains where gathering expert annotation is arduous.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Few-shot Image Classification | Aves | Accuracy42.9 | 22 | |
| Fine-grained species classification | Mollusca Species196 16-shot (test) | Accuracy47.7 | 18 | |
| Fine-grained species classification | Fungi FungiTastic 16-shot (test) | Accuracy8.9 | 18 | |
| Fine-grained species classification | Insecta Species196 16-shot (test) | Accuracy32.8 | 18 | |
| Fine-grained species classification | Weeds Species196 16-shot (test) | Accuracy65 | 18 | |
| Fine-grained species classification | iNaturalist Aves 16-shot 2018 (test) | Accuracy47.5 | 18 | |
| Fine grained classification | Birds-200 | cACC51.1 | 12 | |
| Fine grained classification | Dogs-120 | cACC48.1 | 11 | |
| Fine grained classification | Flowers-102 | cACC63.8 | 11 | |
| Fine grained classification | CARS 196 | cACC49.2 | 10 |