Reusing Pretrained Models by Multi-linear Operators for Efficient Training

About

Training large models from scratch usually costs a substantial amount of resources. Towards this problem, recent studies such as bert2BERT and LiGO have reused small pretrained models to initialize a large model (termed the ``target model''), leading to a considerable acceleration in training. Despite the successes of these previous studies, they grew pretrained models by mapping partial weights only, ignoring potential correlations across the entire model. As we show in this paper, there are inter- and intra-interactions among the weights of both the pretrained and the target models. As a result, the partial mapping may not capture the complete information and lead to inadequate growth. In this paper, we propose a method that linearly correlates each weight of the target model to all the weights of the pretrained model to further enhance acceleration ability. We utilize multi-linear operators to reduce computational and spacial complexity, enabling acceptable resource requirements. Experiments demonstrate that our method can save 76\% computational costs on DeiT-base transferred from DeiT-small, which outperforms bert2BERT by +12.0\% and LiGO by +20.7\%, respectively.

Yu Pan, Ye Yuan, Yichun Yin, Zenglin Xu, Lifeng Shang, Xin Jiang, Qun Liu• 2023

Related benchmarks

Task	Dataset	Result
Image Classification	Stanford Cars	Accuracy87.2	660
Image Classification	Food-101	Accuracy83.8	570
Natural Language Understanding	GLUE	SST-292.71	551
Classification	Cars	Accuracy91.83	492
Image Classification	CIFAR-100	Accuracy82.3	435
Image Classification	CUB-200 2011	Accuracy73.7	374
Image Classification	CIFAR100	Accuracy90.23	347
Image Classification	Oxford Flowers 102	Accuracy94.6	234
Image Classification	CIFAR10	Accuracy99.13	137
Image Classification	Flowers	Accuracy97.49	135

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord