Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WAVE: Weight Templates for Adaptive Initialization of Variable-sized Models

About

The growing complexity of model parameters underscores the significance of pre-trained models. However, deployment constraints often necessitate models of varying sizes, exposing limitations in the conventional pre-training and fine-tuning paradigm, particularly when target model sizes are incompatible with pre-trained ones. To address this challenge, we propose WAVE, a novel approach that reformulates variable-sized model initialization from a multi-task perspective, where initializing each model size is treated as a distinct task. WAVE employs shared, size-agnostic weight templates alongside size-specific weight scalers to achieve consistent initialization across various model sizes. These weight templates, constructed within the Learngene framework, integrate knowledge from pre-trained models through a distillation process constrained by Kronecker-based rules. Target models are then initialized by concatenating and weighting these templates, with adaptive connection rules established by lightweight weight scalers, whose parameters are learned from minimal training data. Extensive experiments demonstrate the efficiency of WAVE, achieving state-of-the-art performance in initializing models of various depth and width. The knowledge encapsulated in weight templates is also task-agnostic, allowing for seamless transfer across diverse downstream datasets. Code will be made available at https://github.com/fu-feng/WAVE.

Fu Feng, Yucheng Xie, Jing Wang, Xin Geng• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet-1K 1.0 (val)
Top-1 Accuracy79.2
1952
Image ClassificationImageNet-1K
Top-1 Acc78.3
1239
Semantic segmentationADE20K
mIoU33.84
1024
Image ClassificationStanford Cars
Accuracy89.4
635
Image ClassificationFood-101
Accuracy85.5
542
Image ClassificationCIFAR-10
Accuracy97.4
507
Image ClassificationCUB-200 2011
Accuracy78.1
356
Image ClassificationCIFAR100
Accuracy75.58
347
Image ClassificationCUB-200
Accuracy56.77
106
Image ClassificationiNaturalist 2019
Top-1 Acc63.7
98
Showing 10 of 19 rows

Other info

Code

Follow for update