Learning Consistent Taxonomic Classification through Hierarchical Reasoning

About

While Vision-Language Models (VLMs) excel at visual understanding, they often fail to grasp hierarchical knowledge. This leads to common errors where VLMs misclassify coarser taxonomic levels even when correctly identifying the most specific level (leaf level). Existing approaches largely overlook this issue by failing to model hierarchical reasoning. To address this gap, we propose VL-Taxon, a two-stage, hierarchy-based reasoning framework designed to improve both leaf-level accuracy and hierarchical consistency in taxonomic classification. The first stage employs a top-down process to enhance leaf-level classification accuracy. The second stage then leverages this accurate leaf-level output to ensure consistency throughout the entire taxonomic hierarchy. Each stage is initially trained with supervised fine-tuning to instill taxonomy knowledge, followed by reinforcement learning to refine the model's reasoning and generalization capabilities. Extensive experiments reveal a remarkable result: our VL-Taxon framework, implemented on the Qwen2.5-VL-7B model, outperforms its original 72B counterpart by over 10% in both leaf-level and hierarchical consistency accuracy on average on the iNaturalist-2021 dataset. Notably, this significant gain was achieved by fine-tuning on just a small subset of data, without relying on any examples generated by other VLMs.

Zhenghong Li, Kecheng Zheng, Haibin Ling• 2026

Related benchmarks

Task	Dataset	Result
Taxonomic Classification	iNat Animal 21	HCA43.73	9
Taxonomic Classification	iNat Plant 21	HCA63.04	9
Taxonomic Classification	CUB-200	HCA60.67	9
Open-set taxonomic classification	iNat Animal 2021 (test)	HCA1.53e+3	3

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord