Knowledge Distillation by On-the-Fly Native Ensemble

About

Knowledge distillation is effective to train small and generalisable network models for meeting the low-memory and fast running requirements. Existing offline distillation methods rely on a strong pre-trained teacher, which enables favourable knowledge discovery and transfer but requires a complex two-phase training procedure. Online counterparts address this limitation at the price of lacking a highcapacity teacher. In this work, we present an On-the-fly Native Ensemble (ONE) strategy for one-stage online distillation. Specifically, ONE trains only a single multi-branch network while simultaneously establishing a strong teacher on-the- fly to enhance the learning of target network. Extensive evaluations show that ONE improves the generalisation performance a variety of deep neural networks more significantly than alternative methods on four image classification dataset: CIFAR10, CIFAR100, SVHN, and ImageNet, whilst having the computational efficiency advantages.

Xu Lan, Xiatian Zhu, Shaogang Gong• 2018

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100 (test)	--	3518
Image Classification	CIFAR-10 (test)	--	3381
Image Classification	ImageNet-1k (val)	--	1498
Natural Language Understanding	GLUE (dev)	SST-2 (Acc)93.1	529
Image Classification	ImageNet (val)	Accuracy70.18	300
Hyperspectral Image Classification	Pavia University (test)	Overall Accuracy (OA)73.9	103
Hyperspectral Image Classification	Indian Pines (test)	Overall Accuracy (OA)71.78	100
Hyperspectral Image Classification	Pavia University (PU) HU-to-PU (test)	Overall Accuracy (OA)0.7942	23
Hyperspectral Image Classification	Indian Pines to Houston Knowledge Transfer (test)	Overall Accuracy (OA)81.73	15
Image Classification	ImageNet (val)	Top-1 Error29.45	12

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord