Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

WAVE: Weight Templates for Adaptive Initialization of Variable-sized Models

About

The growing complexity of model parameters underscores the significance of pre-trained models. However, deployment constraints often necessitate models of varying sizes, exposing limitations in the conventional pre-training and fine-tuning paradigm, particularly when target model sizes are incompatible with pre-trained ones. To address this challenge, we propose WAVE, a novel approach that reformulates variable-sized model initialization from a multi-task perspective, where initializing each model size is treated as a distinct task. WAVE employs shared, size-agnostic weight templates alongside size-specific weight scalers to achieve consistent initialization across various model sizes. These weight templates, constructed within the Learngene framework, integrate knowledge from pre-trained models through a distillation process constrained by Kronecker-based rules. Target models are then initialized by concatenating and weighting these templates, with adaptive connection rules established by lightweight weight scalers, whose parameters are learned from minimal training data. Extensive experiments demonstrate the efficiency of WAVE, achieving state-of-the-art performance in initializing models of various depth and width. The knowledge encapsulated in weight templates is also task-agnostic, allowing for seamless transfer across diverse downstream datasets. Code will be made available at https://github.com/fu-feng/WAVE.

Fu Feng, Yucheng Xie, Jing Wang, Xin Geng• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet-1K 1.0 (val)
Top-1 Accuracy79.2
1866
Semantic segmentationADE20K
mIoU33.84
936
Image ClassificationImageNet-1K
Top-1 Acc78.3
836
Image ClassificationCIFAR-10
Accuracy97.4
507
Image ClassificationFood-101
Accuracy85.5
494
Image ClassificationStanford Cars
Accuracy89.4
477
Image ClassificationCIFAR100
Accuracy75.58
331
Image ClassificationCUB-200 2011
Accuracy78.1
257
Image ClassificationiNaturalist 2019
Top-1 Acc63.7
98
Image ClassificationCUB-200
Accuracy56.77
92
Showing 10 of 18 rows

Other info

Code

Follow for update