Fully Convolutional Attention Networks for Fine-Grained Recognition

About

Fine-grained recognition is challenging due to its subtle local inter-class differences versus large intra-class variations such as poses. A key to address this problem is to localize discriminative parts to extract pose-invariant features. However, ground-truth part annotations can be expensive to acquire. Moreover, it is hard to define parts for many fine-grained classes. This work introduces Fully Convolutional Attention Networks (FCANs), a reinforcement learning framework to optimally glimpse local discriminative regions adaptive to different fine-grained domains. Compared to previous methods, our approach enjoys three advantages: 1) the weakly-supervised reinforcement learning procedure requires no expensive part annotations; 2) the fully-convolutional architecture speeds up both training and testing; 3) the greedy reward strategy accelerates the convergence of the learning. We demonstrate the effectiveness of our method with extensive experiments on four challenging fine-grained benchmark datasets, including CUB-200-2011, Stanford Dogs, Stanford Cars and Food-101.

Xiao Liu, Tian Xia, Jiang Wang, Yi Yang, Feng Zhou, Yuanqing Lin• 2016

Related benchmarks

Task	Dataset	Result
Image Classification	Food-101	Accuracy86.5	590
Fine-grained Image Classification	CUB200 2011 (test)	Accuracy84.7	585
Fine-grained Image Classification	Stanford Cars (test)	Accuracy93.1	372
Image Classification	Stanford Cars (test)	Accuracy93.1	320
Fine-grained Image Classification	Stanford Cars	Accuracy91.3	298
Image Classification	Stanford Dogs (test)	Top-1 Acc84.2	140
Fine-grained Image Classification	Stanford Dogs (test)	Accuracy88.9	124
Fine-grained Visual Categorization	Stanford Dogs	Accuracy89	51
Fine-grained Visual Categorization	CUB-Birds	Accuracy84.3	31

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord