Class Attention Transfer Based Knowledge Distillation

About

Previous knowledge distillation methods have shown their impressive performance on model compression tasks, however, it is hard to explain how the knowledge they transferred helps to improve the performance of the student network. In this work, we focus on proposing a knowledge distillation method that has both high interpretability and competitive performance. We first revisit the structure of mainstream CNN models and reveal that possessing the capacity of identifying class discriminative regions of input is critical for CNN to perform classification. Furthermore, we demonstrate that this capacity can be obtained and enhanced by transferring class activation maps. Based on our findings, we propose class attention transfer based knowledge distillation (CAT-KD). Different from previous KD methods, we explore and present several properties of the knowledge transferred by our method, which not only improve the interpretability of CAT-KD but also contribute to a better understanding of CNN. While having high interpretability, CAT-KD achieves state-of-the-art performance on multiple benchmarks. Code is available at: https://github.com/GzyAftermath/CAT-KD.

Ziyao Guo, Haonan Yan, Hui Li, Xiaodong Lin• 2023

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100 (test)	Accuracy78.84	3518
Image Classification	CIFAR-100 (val)	Accuracy76.91	781
Image Classification	DTD	Accuracy62.4	599
Image Classification	CIFAR100	Accuracy72.11	301
Image Classification	TinyImageNet (val)	--	289
Image Classification	ImageNet (val)	Top-1 Accuracy71.26	188
Image Classification	ImageNet (val)	Top-1 Accuracy72.24	163
Image Classification	Stanford Dogs	Accuracy62.4	153
Facial Expression Recognition	RAF-DB	Accuracy84.68	97
Image Classification	ImageNet-1K	Top-1 Acc72.24	75

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord