Aligning Logits Generatively for Principled Black-Box Knowledge Distillation
About
Black-Box Knowledge Distillation (B2KD) is a formulated problem for cloud-to-edge model compression with invisible data and models hosted on the server. B2KD faces challenges such as limited Internet exchange and edge-cloud disparity of data distributions. In this paper, we formalize a two-step workflow consisting of deprivatization and distillation, and theoretically provide a new optimization direction from logits to cell boundary different from direct logits alignment. With its guidance, we propose a new method Mapping-Emulation KD (MEKD) that distills a black-box cumbersome model into a lightweight one. Our method does not differentiate between treating soft or hard responses, and consists of: 1) deprivatization: emulating the inverse mapping of the teacher function with a generator, and 2) distillation: aligning low-dimensional logits of the teacher and student models by reducing the distance of high-dimensional image points. For different teacher-student pairs, our method yields inspiring distillation performance on various benchmarks, and outperforms the previous state-of-the-art approaches.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 (test) | -- | 3518 | |
| Image Classification | MNIST (test) | Accuracy99.45 | 882 | |
| Image Classification | ImageNet-1k (val) | Top-1 Accuracy61.21 | 840 | |
| Image Classification | ImageNet-1K | Top-1 Acc61.21 | 836 | |
| Image Classification | CIFAR-100 | Top-1 Accuracy67.36 | 622 | |
| Image Classification | CIFAR-10 | -- | 507 | |
| Image Classification | MNIST | -- | 395 | |
| Image Classification | TinyImageNet (test) | -- | 366 | |
| Image Classification | Tiny-ImageNet | Top-1 Accuracy54.93 | 143 | |
| Image Classification | SVHN (test) | Top-1 Accuracy89.21 | 26 |