Visual Species Recognition with Large Multimodal Models as Post-Hoc Correctors

About

Visual Species Recognition (VSR) is a fundamental task in scientific disciplines that require species-level identification, including ecology, palynology, evolutionary biology, systematics, and phylogenetics. Automating VSR through machine learning can significantly accelerate these efforts. However, species-level annotation requires extensive domain expertise, making large-scale labeled datasets difficult to obtain. Consequently, few-shot learning (FSL) is a practical paradigm, where an expert model is trained using only a few labeled examples. Meanwhile, Large Multimodal Models (LMMs) have demonstrated unprecedented zero-shot visual recognition capabilities, raising the question of whether they can serve as an alternative to FSL expert models for VSR. We start this work with a systematic comparison between FSL expert models and LMMs, revealing that, despite advanced prompting strategies, contemporary LMMs significantly underperform FSL expert models. Interestingly, we find that LMMs possess a complementary strength: given an image and a shortlist of candidate species generated by an expert model, LMMs can often recover the correct label when the expert model's top prediction is incorrect. Motivated by this, we propose Post-hoc Correction (POC), a simple training-free framework that leverages an LMM to post-process an expert model's top predictions. We develop a multimodal prompting strategy to enable POC to improve FSL expert models by 6.4 accuracy points, averaged over five VSR benchmarks. We show that POC generalizes across diverse FSL methods, visual encoders, and LMMs, making it a practical and effective framework for VSR.

Tian Liu, Anwesha Basu, James Caverlee, Shu Kong• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	Fungi	--	25
Few-shot Image Classification	Aves	Accuracy69.4	22
Fine-grained species classification	iNaturalist Aves 16-shot 2018 (test)	Accuracy69.4	18
Fine-grained species classification	Insecta Species196 16-shot (test)	Accuracy70.8	18
Fine-grained species classification	Weeds Species196 16-shot (test)	Accuracy87.7	18
Fine-grained species classification	Mollusca Species196 16-shot (test)	Accuracy71.6	18
Fine-grained species classification	Fungi FungiTastic 16-shot (test)	Accuracy31.1	18
Few-shot Image Classification	Fungi	Accuracy15	8
Visual Species Recognition	Aves	--	6
Visual Species Recognition	Insecta	--	6

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord