Cross-Layer Distillation with Semantic Calibration

About

Knowledge distillation is a technique to enhance the generalization ability of a student model by exploiting outputs from a teacher model. Recently, feature-map based variants explore knowledge transfer between manually assigned teacher-student pairs in intermediate layers for further improvement. However, layer semantics may vary in different neural networks and semantic mismatch in manual layer associations will lead to performance degeneration due to negative regularization. To address this issue, we propose Semantic Calibration for cross-layer Knowledge Distillation (SemCKD), which automatically assigns proper target layers of the teacher model for each student layer with an attention mechanism. With a learned attention distribution, each student layer distills knowledge contained in multiple teacher layers rather than a specific intermediate layer for appropriate cross-layer supervision. We further provide theoretical analysis of the association weights and conduct extensive experiments to demonstrate the effectiveness of our approach. Code is avaliable at \url{https://github.com/DefangChen/SemCKD}.

Defang Chen, Jian-Ping Mei, Yuan Zhang, Can Wang, Yan Feng, Chun Chen• 2020

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100 (test)	--	3518
Image Classification	ImageNet (test)	Top-1 Acc71.41	235
Medical Image Classification	BUSI	Accuracy86.36	126
Medical Image Classification	BTC	Accuracy78.68	111
Medical Image Classification	COVID	Accuracy80.79	91
Medical Image Classification	ISIC	Accuracy77.18	43
2D Medical Image Segmentation	FIVES	Dice Score58.09	29
Medical Image Classification	Chest Xray	Accuracy94.42	24

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord