Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Student-Friendly Teacher Networks for Knowledge Distillation

About

We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student. Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students and, consequently, more appropriate for knowledge transfer. In other words, at the time of optimizing a teacher model, the proposed algorithm learns the student branches jointly to obtain student-friendly representations. Since the main goal of our approach lies in training teacher models and the subsequent knowledge distillation procedure is straightforward, most of the existing knowledge distillation methods can adopt this technique to improve the performance of diverse student models in terms of accuracy and convergence speed. The proposed algorithm demonstrates outstanding accuracy in several well-known knowledge distillation techniques with various combinations of teacher and student models even in the case that their architectures are heterogeneous and there is no prior knowledge about student models at the time of training teacher networks.

Dae Young Park, Moon-Hyun Cha, Changwook Jeong, Dae Sin Kim, Bohyung Han• 2021

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet-1k (val)
Top-1 Accuracy77.43
840
Natural Language UnderstandingGLUE (dev)
SST-2 (Acc)91.5
504
Natural Language UnderstandingGLUE (test)
SST-2 Accuracy92.7
416
Image ClassificationCIFAR-100 (test)
Top-1 Acc82.52
275
Image ClassificationCIFAR-100
Student Accuracy79.11
42
Image ClassificationCIFAR-100 (test)
Accuracy77.23
32
Image ClassificationSTL10 43 (test)
Accuracy77.45
15
Image ClassificationTinyImageNet 44 (test)
Accuracy42.41
15
Showing 8 of 8 rows

Other info

Follow for update