Curriculum Temperature for Knowledge Distillation

About

Most existing distillation methods ignore the flexible role of the temperature in the loss function and fix it as a hyper-parameter that can be decided by an inefficient grid search. In general, the temperature controls the discrepancy between two distributions and can faithfully determine the difficulty level of the distillation task. Keeping a constant temperature, i.e., a fixed level of task difficulty, is usually sub-optimal for a growing student during its progressive learning stages. In this paper, we propose a simple curriculum-based technique, termed Curriculum Temperature for Knowledge Distillation (CTKD), which controls the task difficulty level during the student's learning career through a dynamic and learnable temperature. Specifically, following an easy-to-hard curriculum, we gradually increase the distillation loss w.r.t. the temperature, leading to increased distillation difficulty in an adversarial manner. As an easy-to-use plug-in technique, CTKD can be seamlessly integrated into existing knowledge distillation frameworks and brings general improvements at a negligible additional computation cost. Extensive experiments on CIFAR-100, ImageNet-2012, and MS-COCO demonstrate the effectiveness of our method. Our code is available at https://github.com/zhengli97/CTKD.

Zheng Li, Xiang Li, Lingfeng Yang, Borui Zhao, Renjie Song, Lei Luo, Jun Li, Jian Yang• 2022

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100 (test)	--	3518
Image Classification	ImageNet (val)	--	300
Image Classification	ImageNet (val)	Top-1 Accuracy71.38	163
Image Classification	ImageNet-1K	Top-1 Acc71.32	75
Image Classification	ImageNet-1k (val)	Top-1 Acc72.87	26
Image Classification	CIFAR-100 1.0 (val)	Top-1 Acc73.39	18

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord