Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Deep Bilinear Transformation for Fine-grained Image Representation

About

Bilinear feature transformation has shown the state-of-the-art performance in learning fine-grained image representations. However, the computational cost to learn pairwise interactions between deep feature channels is prohibitively expensive, which restricts this powerful transformation to be used in deep neural networks. In this paper, we propose a deep bilinear transformation (DBT) block, which can be deeply stacked in convolutional neural networks to learn fine-grained image representations. The DBT block can uniformly divide input channels into several semantic groups. As bilinear transformation can be represented by calculating pairwise interactions within each group, the computational cost can be heavily relieved. The output of each block is further obtained by aggregating intra-group bilinear features, with residuals from the entire input features. We found that the proposed network achieves new state-of-the-art in several fine-grained image recognition benchmarks, including CUB-Bird, Stanford-Car, and FGVC-Aircraft.

Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo• 2019

Related benchmarks

TaskDatasetResultRank
Fine-grained Image ClassificationCUB200 2011 (test)
Accuracy88.1
536
Fine-grained Image ClassificationStanford Cars (test)
Accuracy94.1
348
Fine-grained visual classificationFGVC-Aircraft (test)
Top-1 Acc91.6
287
Image ClassificationCUB-200-2011 (test)
Top-1 Acc88.1
276
Fine-grained Image ClassificationCUB-200 2011
Accuracy88.1
222
Fine-grained Image ClassificationStanford Cars
Accuracy94.1
206
Fine-grained Visual CategorizationStanford Cars (test)
Accuracy94.5
110
Fine-grained Visual CategorizationFGVCAircraft
Accuracy91.6
60
Fine-grained Visual CategorizationCUB-Birds
Accuracy87.5
26
Fine-grained Visual CategorizationCUB
Accuracy88.1
20
Showing 10 of 11 rows

Other info

Follow for update