Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Class Attention Transfer Based Knowledge Distillation

About

Previous knowledge distillation methods have shown their impressive performance on model compression tasks, however, it is hard to explain how the knowledge they transferred helps to improve the performance of the student network. In this work, we focus on proposing a knowledge distillation method that has both high interpretability and competitive performance. We first revisit the structure of mainstream CNN models and reveal that possessing the capacity of identifying class discriminative regions of input is critical for CNN to perform classification. Furthermore, we demonstrate that this capacity can be obtained and enhanced by transferring class activation maps. Based on our findings, we propose class attention transfer based knowledge distillation (CAT-KD). Different from previous KD methods, we explore and present several properties of the knowledge transferred by our method, which not only improve the interpretability of CAT-KD but also contribute to a better understanding of CNN. While having high interpretability, CAT-KD achieves state-of-the-art performance on multiple benchmarks. Code is available at: https://github.com/GzyAftermath/CAT-KD.

Ziyao Guo, Haonan Yan, Hui Li, Xiaodong Lin• 2023

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100 (test)
Accuracy78.84
3518
Image ClassificationImageNet (val)
Top-1 Accuracy71.26
188
Image ClassificationImageNet-1K
Top-1 Acc72.24
75
Image ClassificationImageNet-1k (val)
Top-1 Acc72.24
26
Image ClassificationCIFAR-100 1.0 (val)
Top-1 Acc76.91
18
Image ClassificationImageNet-1k (val)
Top-1 Acc72.24
9
Showing 6 of 6 rows

Other info

Code

Follow for update