UNIC: Universal Classification Models via Multi-teacher Distillation

About

Pretrained models have become a commodity and offer strong results on a broad range of tasks. In this work, we focus on classification and seek to learn a unique encoder able to take from several complementary pretrained models. We aim at even stronger generalization across a variety of classification tasks. We propose to learn such an encoder via multi-teacher distillation. We first thoroughly analyse standard distillation when driven by multiple strong teachers with complementary strengths. Guided by this analysis, we gradually propose improvements to the basic distillation setup. Among those, we enrich the architecture of the encoder with a ladder of expendable projectors, which increases the impact of intermediate features during distillation, and we introduce teacher dropping, a regularization mechanism that better balances the teachers' influence. Our final distillation strategy leads to student models of the same capacity as any of the teachers, while retaining or improving upon the performance of the best teacher for each task. Project page and code: https://europe.naverlabs.com/unic

Mert Bulent Sariyildiz, Philippe Weinzaepfel, Thomas Lucas, Diane Larlus, Yannis Kalantidis• 2024

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K	mIoU48.3	1028
Semantic segmentation	Pascal Context	mIoU81.82	217
Depth Estimation	NYU V2	RMSE0.4916	167
Semantic segmentation	NYUD v2	mIoU58.56	150
Semantic segmentation	Pascal Context	mIoU81.82	53
Saliency Detection	Pascal Context	maxF Score81.84	45
Surface Normal Estimation	Pascal Context	Mean Error (MAE)15.78	45
Surface Normal Estimation	NYUD	mErr19.34	38
Human Parsing	Pascal Context	mIoU72.24	35
Depth Estimation	NYUD	RMSE0.4916	25

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord