Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification

About

Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition. For fine-grained recognition, context-aware rich feature representation of object/scene plays a key role since it exhibits a significant variance in the same subcategory and subtle variance among different subcategories. Finding the subtle variance that fully characterizes the object/scene is not straightforward. To address this, we propose a novel context-aware attentional pooling (CAP) that effectively captures subtle changes via sub-pixel gradients, and learns to attend informative integral regions and their importance in discriminating different subcategories without requiring the bounding-box and/or distinguishable part annotations. We also introduce a novel feature encoding by considering the intrinsic consistency between the informativeness of the integral regions and their spatial structures to capture the semantic correlation among them. Our approach is simple yet extremely effective and can be easily applied on top of a standard classification backbone network. We evaluate our approach using six state-of-the-art (SotA) backbone networks and eight benchmark datasets. Our method significantly outperforms the SotA approaches on six datasets and is very competitive with the remaining two.

Ardhendu Behera, Zachary Wharton, Pradeep Hewage, Asish Bera• 2021

Related benchmarks

TaskDatasetResultRank
Fine-grained Image ClassificationCUB200 2011 (test)
Accuracy91.9
536
Image ClassificationStanford Cars
Accuracy95.7
477
Fine-grained Image ClassificationStanford Cars (test)
Accuracy95.7
348
Image ClassificationAircraft
Accuracy94.1
302
Fine-grained visual classificationFGVC-Aircraft (test)
Top-1 Acc94.9
287
Image ClassificationFGVC-Aircraft (test)
Accuracy94.9
231
Fine-grained Image ClassificationCUB-200 2011
Accuracy91.8
222
Fine-grained Image ClassificationStanford Cars
Accuracy95.7
206
Fine-grained visual classificationNABirds (test)
Top-1 Accuracy91
157
Fine-grained Image ClassificationStanford Dogs (test)
Accuracy96.1
117
Showing 10 of 27 rows

Other info

Code

Follow for update