FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models

About

The training of diffusion models is computationally intensive, making effective pre-training essential. However, real-world deployments often demand models of variable sizes due to diverse memory and computational constraints, posing challenges when corresponding pre-trained versions are unavailable. To address this, we propose FINE, a novel pre-training method whose resulting model can flexibly factorize its knowledge into fundamental components, termed learngenes, enabling direct initialization of models of various sizes and eliminating the need for repeated pre-training. Rather than optimizing a conventional full-parameter model, FINE represents each layer's weights as the product of $U_{\star}$, $\Sigma_{\star}^{(l)}$, and $V_{\star}^\top$, where $U_{\star}$ and $V_{\star}$ serve as size-agnostic learngenes shared across layers, while $\Sigma_{\star}^{(l)}$ remains layer-specific. By jointly training these components, FINE forms a decomposable and transferable knowledge structure that allows efficient initialization through flexible recombination of learngenes, requiring only light retraining of $\Sigma_{\star}^{(l)}$ on limited data. Extensive experiments demonstrate the efficiency of FINE, achieving state-of-the-art performance in initializing variable-sized models across diverse resource-constrained deployments. Furthermore, models initialized by FINE effectively adapt to diverse tasks, showcasing the task-agnostic versatility of learngenes.

Yucheng Xie, Fu Feng, Ruixiao Shi, Jianlu Shen, Jing Wang, Yong Rui, Xin Geng• 2024

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet-1K	Top-1 Acc74.75	1239
Image Generation	LSUN church	FID15.8	117
Class-conditioned image generation	ImageNet-1k 1.0 (test val)	FID35.59	100
Image Generation	CelebA	FID7.99	96
Image Generation	Pokemon	FDD0.38	22
Image Generation	Bedroom	FID14.9	22
Image Generation	Hubble	FDD0.101	22
Image Generation	MRI	FDD0.041	22

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord