Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Consistent Taxonomic Classification through Hierarchical Reasoning

About

While Vision-Language Models (VLMs) excel at visual understanding, they often fail to grasp hierarchical knowledge. This leads to common errors where VLMs misclassify coarser taxonomic levels even when correctly identifying the most specific level (leaf level). Existing approaches largely overlook this issue by failing to model hierarchical reasoning. To address this gap, we propose VL-Taxon, a two-stage, hierarchy-based reasoning framework designed to improve both leaf-level accuracy and hierarchical consistency in taxonomic classification. The first stage employs a top-down process to enhance leaf-level classification accuracy. The second stage then leverages this accurate leaf-level output to ensure consistency throughout the entire taxonomic hierarchy. Each stage is initially trained with supervised fine-tuning to instill taxonomy knowledge, followed by reinforcement learning to refine the model's reasoning and generalization capabilities. Extensive experiments reveal a remarkable result: our VL-Taxon framework, implemented on the Qwen2.5-VL-7B model, outperforms its original 72B counterpart by over 10% in both leaf-level and hierarchical consistency accuracy on average on the iNaturalist-2021 dataset. Notably, this significant gain was achieved by fine-tuning on just a small subset of data, without relying on any examples generated by other VLMs.

Zhenghong Li, Kecheng Zheng, Haibin Ling• 2026

Related benchmarks

TaskDatasetResultRank
Taxonomic ClassificationiNat Animal 21
HCA43.73
9
Taxonomic ClassificationiNat Plant 21
HCA63.04
9
Taxonomic ClassificationCUB-200
HCA60.67
9
Open-set taxonomic classificationiNat Animal 2021 (test)
HCA1.53e+3
3
Showing 4 of 4 rows

Other info

Follow for update