Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Class Attention Transfer Based Knowledge Distillation

About

Previous knowledge distillation methods have shown their impressive performance on model compression tasks, however, it is hard to explain how the knowledge they transferred helps to improve the performance of the student network. In this work, we focus on proposing a knowledge distillation method that has both high interpretability and competitive performance. We first revisit the structure of mainstream CNN models and reveal that possessing the capacity of identifying class discriminative regions of input is critical for CNN to perform classification. Furthermore, we demonstrate that this capacity can be obtained and enhanced by transferring class activation maps. Based on our findings, we propose class attention transfer based knowledge distillation (CAT-KD). Different from previous KD methods, we explore and present several properties of the knowledge transferred by our method, which not only improve the interpretability of CAT-KD but also contribute to a better understanding of CNN. While having high interpretability, CAT-KD achieves state-of-the-art performance on multiple benchmarks. Code is available at: https://github.com/GzyAftermath/CAT-KD.

Ziyao Guo, Haonan Yan, Hui Li, Xiaodong Lin• 2023

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100 (test)
Accuracy78.84
3518
Image ClassificationCIFAR-100 (val)
Accuracy76.91
776
Image ClassificationDTD
Accuracy62.4
542
Image ClassificationTinyImageNet (val)--
289
Image ClassificationImageNet (val)
Top-1 Accuracy71.26
188
Image ClassificationStanford Dogs
Accuracy62.4
153
Image ClassificationCIFAR100
Accuracy72.11
102
Image ClassificationImageNet-1K
Top-1 Acc72.24
75
Image ClassificationImageNet (val)
Top-1 Accuracy72.24
55
Facial Expression RecognitionRAF-DB
Accuracy84.68
53
Showing 10 of 13 rows

Other info

Code

Follow for update