Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Variational Information Distillation for Knowledge Transfer

About

Transferring knowledge from a teacher neural network pretrained on the same or a similar task to a student neural network can significantly improve the performance of the student neural network. Existing knowledge transfer approaches match the activations or the corresponding hand-crafted features of the teacher and the student networks. We propose an information-theoretic framework for knowledge transfer which formulates knowledge transfer as maximizing the mutual information between the teacher and the student networks. We compare our method with existing knowledge transfer methods on both knowledge distillation and transfer learning tasks and show that our method consistently outperforms existing methods. We further demonstrate the strength of our method on knowledge transfer across heterogeneous network architectures by transferring knowledge from a convolutional neural network (CNN) to a multi-layer perceptron (MLP) on CIFAR-10. The resulting MLP significantly outperforms the-state-of-the-art methods and it achieves similar performance to the CNN with a single convolutional layer.

Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D. Lawrence, Zhenwen Dai• 2019

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100 (test)
Accuracy74.11
3518
Image ClassificationCIFAR100 (test)
Top-1 Accuracy74.82
377
Image ClassificationTinyImageNet (test)
Accuracy36.09
366
Image ClassificationSTL-10 (test)
Accuracy69.29
357
Image ClassificationImageNet (test)
Top-1 Acc71.11
235
Image ClassificationCIFAR100 (test)
Test Accuracy75.78
147
Image ClassificationCIFAR-100
Nominal Accuracy73.61
116
Medical Image ClassificationBTC
Accuracy78.17
107
Medical Image ClassificationBUSI
Accuracy85.06
88
Medical Image ClassificationCOVID
Accuracy77.34
54
Showing 10 of 18 rows

Other info

Follow for update