Learning a Discriminative Filter Bank within a CNN for Fine-grained Recognition
About
Compared to earlier multistage frameworks using CNN features, recent end-to-end deep approaches for fine-grained recognition essentially enhance the mid-level learning capability of CNNs. Previous approaches achieve this by introducing an auxiliary network to infuse localization information into the main classification network, or a sophisticated feature encoding method to capture higher order feature statistics. We show that mid-level representation learning can be enhanced within the CNN framework, by learning a bank of convolutional filters that capture class-specific discriminative patches without extra part or bounding box annotations. Such a filter bank is well structured, properly initialized and discriminatively learned through a novel asymmetric multi-stream architecture with convolutional filter supervision and a non-random layer initialization. Experimental results show that our approach achieves state-of-the-art on three publicly available fine-grained recognition datasets (CUB-200-2011, Stanford Cars and FGVC-Aircraft). Ablation studies and visualizations are provided to understand our approach.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Fine-grained Image Classification | CUB200 2011 (test) | Accuracy87.4 | 536 | |
| Fine-grained Image Classification | Stanford Cars (test) | Accuracy93.8 | 348 | |
| Image Classification | Stanford Cars (test) | Accuracy93.8 | 306 | |
| Fine-grained visual classification | FGVC-Aircraft (test) | Top-1 Acc92 | 287 | |
| Image Classification | CUB-200-2011 (test) | Top-1 Acc87.4 | 276 | |
| Image Classification | FGVC-Aircraft (test) | Accuracy92 | 231 | |
| Fine-grained Image Classification | CUB-200 2011 | Accuracy87.4 | 222 | |
| Fine-grained Image Classification | Stanford Cars | Accuracy93.8 | 206 | |
| Image Classification | FGVC Aircraft | Top-1 Accuracy92 | 185 | |
| Fine-grained Visual Categorization | Stanford Cars (test) | Accuracy93.8 | 110 |