Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Democratizing Fine-grained Visual Recognition with Large Language Models

About

Identifying subordinate-level categories from images is a longstanding task in computer vision and is referred to as fine-grained visual recognition (FGVR). It has tremendous significance in real-world applications since an average layperson does not excel at differentiating species of birds or mushrooms due to subtle differences among the species. A major bottleneck in developing FGVR systems is caused by the need of high-quality paired expert annotations. To circumvent the need of expert knowledge we propose Fine-grained Semantic Category Reasoning (FineR) that internally leverages the world knowledge of large language models (LLMs) as a proxy in order to reason about fine-grained category names. In detail, to bridge the modality gap between images and LLM, we extract part-level visual attributes from images as text and feed that information to a LLM. Based on the visual attributes and its internal world knowledge the LLM reasons about the subordinate-level category names. Our training-free FineR outperforms several state-of-the-art FGVR and language and vision assistant models and shows promise in working in the wild and in new domains where gathering expert annotation is arduous.

Mingxuan Liu, Subhankar Roy, Wenjing Li, Zhun Zhong, Nicu Sebe, Elisa Ricci• 2024

Related benchmarks

TaskDatasetResultRank
Few-shot Image ClassificationAves
Accuracy42.9
22
Fine-grained species classificationMollusca Species196 16-shot (test)
Accuracy47.7
18
Fine-grained species classificationFungi FungiTastic 16-shot (test)
Accuracy8.9
18
Fine-grained species classificationInsecta Species196 16-shot (test)
Accuracy32.8
18
Fine-grained species classificationWeeds Species196 16-shot (test)
Accuracy65
18
Fine-grained species classificationiNaturalist Aves 16-shot 2018 (test)
Accuracy47.5
18
Fine grained classificationBirds-200
cACC51.1
12
Fine grained classificationDogs-120
cACC48.1
11
Fine grained classificationFlowers-102
cACC63.8
11
Fine grained classificationCARS 196
cACC49.2
10
Showing 10 of 12 rows

Other info

Follow for update