Evaluation of Output Embeddings for Fine-Grained Image Classification
About
Image classification has advanced significantly in recent years with the availability of large-scale image sets. However, fine-grained classification remains a major challenge due to the annotation cost of large numbers of fine-grained categories. This project shows that compelling classification performance can be achieved on such categories even without labeled training data. Given image and class embeddings, we learn a compatibility function such that matching embeddings are assigned a higher score than mismatching ones; zero-shot classification of an image proceeds by finding the label yielding the highest joint compatibility score. We use state-of-the-art image features and focus on different supervised attributes and unsupervised output embeddings either derived from hierarchies or learned from unlabeled text corpora. We establish a substantially improved state-of-the-art on the Animals with Attributes and Caltech-UCSD Birds datasets. Most encouragingly, we demonstrate that purely unsupervised output embeddings (learned from Wikipedia and improved with fine-grained text) achieve compelling results, even outperforming the previous supervised state-of-the-art. By combining different output embeddings, we further improve results.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Action Recognition | UCF101 | Accuracy9.9 | 365 | |
| Generalized Zero-Shot Learning | CUB | H Score33.6 | 250 | |
| Action Recognition | HMDB51 | 3-Fold Accuracy13.3 | 191 | |
| Generalized Zero-Shot Learning | SUN | H19.8 | 184 | |
| Generalized Zero-Shot Learning | AWA2 | S Score73.9 | 165 | |
| Zero-shot Learning | CUB | Top-1 Accuracy53.9 | 144 | |
| Zero-shot Learning | SUN | Top-1 Accuracy53.7 | 114 | |
| Zero-shot Learning | AWA2 | Top-1 Accuracy0.619 | 95 | |
| Image Classification | CUB-200 | Accuracy50.1 | 92 | |
| Image Classification | CUB | Unseen Top-1 Acc23.5 | 89 |