Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Knowledge Distillation by On-the-Fly Native Ensemble

About

Knowledge distillation is effective to train small and generalisable network models for meeting the low-memory and fast running requirements. Existing offline distillation methods rely on a strong pre-trained teacher, which enables favourable knowledge discovery and transfer but requires a complex two-phase training procedure. Online counterparts address this limitation at the price of lacking a highcapacity teacher. In this work, we present an On-the-fly Native Ensemble (ONE) strategy for one-stage online distillation. Specifically, ONE trains only a single multi-branch network while simultaneously establishing a strong teacher on-the- fly to enhance the learning of target network. Extensive evaluations show that ONE improves the generalisation performance a variety of deep neural networks more significantly than alternative methods on four image classification dataset: CIFAR10, CIFAR100, SVHN, and ImageNet, whilst having the computational efficiency advantages.

Xu Lan, Xiatian Zhu, Shaogang Gong• 2018

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100 (test)--
3518
Image ClassificationCIFAR-10 (test)--
3381
Image ClassificationImageNet-1k (val)--
1453
Natural Language UnderstandingGLUE (dev)
SST-2 (Acc)93.1
504
Image ClassificationImageNet (val)
Accuracy70.18
300
Hyperspectral Image ClassificationPavia University (test)
Average Accuracy (AA)78.88
96
Hyperspectral Image ClassificationIndian Pines (test)
Overall Accuracy (OA)71.78
83
Hyperspectral Image ClassificationPavia University (PU) HU-to-PU (test)
Overall Accuracy (OA)0.7942
23
Hyperspectral Image ClassificationIndian Pines to Houston Knowledge Transfer (test)
Overall Accuracy (OA)81.73
15
Image ClassificationImageNet (val)
Top-1 Error29.45
12
Showing 10 of 12 rows

Other info

Follow for update