Aligning Logits Generatively for Principled Black-Box Knowledge Distillation

About

Black-Box Knowledge Distillation (B2KD) is a formulated problem for cloud-to-edge model compression with invisible data and models hosted on the server. B2KD faces challenges such as limited Internet exchange and edge-cloud disparity of data distributions. In this paper, we formalize a two-step workflow consisting of deprivatization and distillation, and theoretically provide a new optimization direction from logits to cell boundary different from direct logits alignment. With its guidance, we propose a new method Mapping-Emulation KD (MEKD) that distills a black-box cumbersome model into a lightweight one. Our method does not differentiate between treating soft or hard responses, and consists of: 1) deprivatization: emulating the inverse mapping of the teacher function with a generator, and 2) distillation: aligning low-dimensional logits of the teacher and student models by reducing the distance of high-dimensional image points. For different teacher-student pairs, our method yields inspiring distillation performance on various benchmarks, and outperforms the previous state-of-the-art approaches.

Jing Ma, Xiang Xiang, Ke Wang, Yuchuan Wu, Yongbin Li• 2022

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100 (test)	--	3518
Image Classification	ImageNet-1K	Top-1 Acc61.21	1239
Image Classification	ImageNet-1k (val)	Top-1 Accuracy61.21	920
Image Classification	MNIST (test)	Accuracy99.45	894
Image Classification	CIFAR-100	--	691
Image Classification	CIFAR-10	--	507
Image Classification	TinyImageNet (test)	--	499
Image Classification	MNIST	--	417
Image Classification	Tiny-ImageNet	Top-1 Accuracy54.93	230
Image Classification	SVHN (test)	Top-1 Accuracy89.21	26

Showing 10 of 10 rows

Other info

Code

Follow for update

@wizwand_team Discord