Progressive Multi-task Anti-Noise Learning and Distilling Frameworks for Fine-grained Vehicle Recognition
About
Fine-grained vehicle recognition (FGVR) is an essential fundamental technology for intelligent transportation systems, but very difficult because of its inherent intra-class variation. Most previous FGVR studies only focus on the intra-class variation caused by different shooting angles, positions, etc., while the intra-class variation caused by image noise has received little attention. This paper proposes a progressive multi-task anti-noise learning (PMAL) framework and a progressive multi-task distilling (PMD) framework to solve the intra-class variation problem in FGVR due to image noise. The PMAL framework achieves high recognition accuracy by treating image denoising as an additional task in image recognition and progressively forcing a model to learn noise invariance. The PMD framework transfers the knowledge of the PMAL-trained model into the original backbone network, which produces a model with about the same recognition accuracy as the PMAL-trained model, but without any additional overheads over the original backbone network. Combining the two frameworks, we obtain models that significantly exceed previous state-of-the-art methods in recognition accuracy on two widely-used, standard FGVR datasets, namely Stanford Cars, and CompCars, as well as three additional surveillance image-based vehicle-type classification datasets, namely Beijing Institute of Technology (BIT)-Vehicle, Vehicle Type Image Data 2 (VTID2), and Vehicle Images Dataset for Make Model Recognition (VIDMMR), without any additional overheads over the original backbone networks. The source code is available at https://github.com/Dichao-Liu/Anti-noise_FGVR
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Fine-grained Image Classification | Stanford Cars (test) | Accuracy97.3 | 348 | |
| Fine-Grained Vehicle Recognition | CompCars | Accuracy99.1 | 11 | |
| Fine-Grained Vehicle Recognition | BIT-Vehicle | Accuracy95.2 | 11 |