Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Deep Mutual Learning

About

Model distillation is an effective and widely used technique to transfer knowledge from a teacher to a student network. The typical application is to transfer from a powerful large network or ensemble to a small network, that is better suited to low-memory or fast execution requirements. In this paper, we present a deep mutual learning (DML) strategy where, rather than one way transfer between a static pre-defined teacher and a student, an ensemble of students learn collaboratively and teach each other throughout the training process. Our experiments show that a variety of network architectures benefit from mutual learning and achieve compelling results on CIFAR-100 recognition and Market-1501 person re-identification benchmarks. Surprisingly, it is revealed that no prior powerful teacher network is necessary -- mutual learning of a collection of simple student networks works, and moreover outperforms distillation from a more powerful yet static teacher.

Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu• 2017

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100 (test)--
3518
Image ClassificationCIFAR-10 (test)--
3381
Person Re-IdentificationMarket1501 (test)
Rank-1 Accuracy89.34
1264
Image ClassificationImageNet (val)
Top-1 Acc71.35
1206
Person Re-IdentificationMarket 1501
mAP68.8
999
Image ClassificationCIFAR-10 (test)
Accuracy87.71
906
Image ClassificationCIFAR-100 (val)
Accuracy73.58
661
Natural Language UnderstandingGLUE (dev)
SST-2 (Acc)93.3
504
Natural Language UnderstandingGLUE (test)
SST-2 Accuracy92.7
416
Image ClassificationImageNet (val)
Accuracy69.82
300
Showing 10 of 27 rows

Other info

Follow for update