Correlation Congruence for Knowledge Distillation
About
Most teacher-student frameworks based on knowledge distillation (KD) depend on a strong congruent constraint on instance level. However, they usually ignore the correlation between multiple instances, which is also valuable for knowledge transfer. In this work, we propose a new framework named correlation congruence for knowledge distillation (CCKD), which transfers not only the instance-level information, but also the correlation between instances. Furthermore, a generalized kernel method based on Taylor series expansion is proposed to better capture the correlation between instances. Empirical experiments and ablation studies on image classification tasks (including CIFAR-100, ImageNet-1K) and metric learning tasks (including ReID and Face Recognition) show that the proposed CCKD substantially outperforms the original KD and achieves state-of-the-art accuracy compared with other SOTA KD-based methods. The CCKD can be easily deployed in the majority of the teacher-student framework such as KD and hint-based learning methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 (test) | Accuracy73.56 | 3518 | |
| Image Classification | ImageNet-1k (val) | -- | 1453 | |
| Image Classification | ImageNet-1K | Top-1 Acc70.79 | 836 | |
| Image Classification | TinyImageNet (test) | Accuracy36.43 | 366 | |
| Image Classification | STL-10 (test) | Accuracy69.13 | 357 | |
| Image Classification | ImageNet (val) | -- | 300 | |
| Image Classification | ImageNet (val) | -- | 188 | |
| Image Classification | CIFAR100 | Average Accuracy73.56 | 121 | |
| Image Classification | DomainNet | Average Accuracy33.86 | 58 | |
| Video Classification | Kinetics-400 v1 (val) | Top-1 Acc68.52 | 35 |