Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Masked Autoencoders Enable Efficient Knowledge Distillers

About

This paper studies the potential of distilling knowledge from pre-trained models, especially Masked Autoencoders. Our approach is simple: in addition to optimizing the pixel reconstruction loss on masked inputs, we minimize the distance between the intermediate feature map of the teacher model and that of the student model. This design leads to a computationally efficient knowledge distillation framework, given 1) only a small visible subset of patches is used, and 2) the (cumbersome) teacher model only needs to be partially executed, ie, forward propagate inputs through the first few layers, for obtaining intermediate feature maps. Compared to directly distilling fine-tuned models, distilling pre-trained models substantially improves downstream performance. For example, by distilling the knowledge from an MAE pre-trained ViT-L into a ViT-B, our method achieves 84.0% ImageNet top-1 accuracy, outperforming the baseline of directly distilling a fine-tuned ViT-L by 1.2%. More intriguingly, our method can robustly distill knowledge from teacher models even with extremely high masking ratios: e.g., with 95% masking ratio where merely TEN patches are visible during distillation, our ViT-B competitively attains a top-1 ImageNet accuracy of 83.6%; surprisingly, it can still secure 82.4% top-1 ImageNet accuracy by aggressively training with just FOUR visible patches (98% masking ratio). The code and models are publicly available at https://github.com/UCSC-VLAA/DMAE.

Yutong Bai, Zeyu Wang, Junfei Xiao, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie• 2022

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K
mIoU32.64
936
Image ClassificationImageNet-1K
Top-1 Acc84
836
Image ClassificationStanford Cars--
477
Image ClassificationCIFAR100
Accuracy77.36
331
Image ClassificationiNaturalist 2019
Top-1 Acc58.84
98
Image ClassificationCUB-200
Accuracy60.84
92
Image ClassificationOxford Flowers
Top-1 Accuracy83.18
78
Image ClassificationImageNet-1K
Top-1 Acc73.95
75
Image ClassificationImageNet 1k (1%)
Top-1 Acc50.3
49
Image GenerationImageNet-1K
FID15.5
42
Showing 10 of 15 rows

Other info

Code

Follow for update