Fully Convolutional Attention Networks for Fine-Grained Recognition
About
Fine-grained recognition is challenging due to its subtle local inter-class differences versus large intra-class variations such as poses. A key to address this problem is to localize discriminative parts to extract pose-invariant features. However, ground-truth part annotations can be expensive to acquire. Moreover, it is hard to define parts for many fine-grained classes. This work introduces Fully Convolutional Attention Networks (FCANs), a reinforcement learning framework to optimally glimpse local discriminative regions adaptive to different fine-grained domains. Compared to previous methods, our approach enjoys three advantages: 1) the weakly-supervised reinforcement learning procedure requires no expensive part annotations; 2) the fully-convolutional architecture speeds up both training and testing; 3) the greedy reward strategy accelerates the convergence of the learning. We demonstrate the effectiveness of our method with extensive experiments on four challenging fine-grained benchmark datasets, including CUB-200-2011, Stanford Dogs, Stanford Cars and Food-101.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Fine-grained Image Classification | CUB200 2011 (test) | Accuracy84.7 | 536 | |
| Image Classification | Food-101 | Accuracy86.5 | 494 | |
| Fine-grained Image Classification | Stanford Cars (test) | Accuracy93.1 | 348 | |
| Image Classification | Stanford Cars (test) | Accuracy93.1 | 306 | |
| Fine-grained Image Classification | Stanford Cars | Accuracy91.3 | 206 | |
| Fine-grained Image Classification | Stanford Dogs (test) | Accuracy88.9 | 117 | |
| Image Classification | Stanford Dogs (test) | Top-1 Acc84.2 | 85 | |
| Fine-grained Visual Categorization | Stanford Dogs | Accuracy89 | 51 | |
| Fine-grained Visual Categorization | CUB-Birds | Accuracy84.3 | 26 |