Your "Flamingo" is My "Bird": Fine-Grained, or Not
About
Whether what you see in Figure 1 is a "flamingo" or a "bird", is the question we ask in this paper. While fine-grained visual classification (FGVC) strives to arrive at the former, for the majority of us non-experts just "bird" would probably suffice. The real question is therefore -- how can we tailor for different fine-grained definitions under divergent levels of expertise. For that, we re-envisage the traditional setting of FGVC, from single-label classification, to that of top-down traversal of a pre-defined coarse-to-fine label hierarchy -- so that our answer becomes "bird"-->"Phoenicopteriformes"-->"Phoenicopteridae"-->"flamingo". To approach this new problem, we first conduct a comprehensive human study where we confirm that most participants prefer multi-granularity labels, regardless whether they consider themselves experts. We then discover the key intuition that: coarse-level label prediction exacerbates fine-grained feature learning, yet fine-level feature betters the learning of coarse-level classifier. This discovery enables us to design a very simple albeit surprisingly effective solution to our new problem, where we (i) leverage level-specific classification heads to disentangle coarse-level features with fine-grained ones, and (ii) allow finer-grained features to participate in coarser-grained label predictions, which in turn helps with better disentanglement. Experiments show that our method achieves superior performance in the new FGVC setting, and performs better than state-of-the-art on traditional single-label FGVC problem as well. Thanks to its simplicity, our method can be easily implemented on top of any existing FGVC frameworks and is parameter-free.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Fine-grained Image Classification | CUB200 2011 (test) | Accuracy89.9 | 536 | |
| Fine-grained Image Classification | Stanford Cars (test) | Accuracy95.1 | 348 | |
| Fine-grained visual classification | FGVC-Aircraft (test) | Top-1 Acc93.6 | 287 | |
| Fine-grained Image Classification | CUB-200 2011 | Accuracy91.25 | 222 | |
| Fine-grained Image Classification | Stanford Cars | -- | 206 | |
| Image Classification | tieredImageNet-H (test) | Hierarchical Distance @11.93 | 38 | |
| Fine grained classification | iNaturalist-19 | Top-1 Error Rate37.23 | 24 | |
| Hierarchical classification | iNaturalist-19 (test) | Accuracy70.37 | 14 | |
| Period Dating | Bronze Ding (test) | Overall Accuracy (OA)73.92 | 13 | |
| Hierarchical classification | CIFAR-100 (test) | Accuracy0.7787 | 12 |