Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

About

Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for initialization we are able to boost training of diverse ImageNet models available in PyTorch. When transferred to other datasets, models initialized with predicted parameters also converge faster and reach competitive final performance.

Boris Knyazev, Doha Hwang, Simon Lacoste-Julien• 2023

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet-1K 1.0 (val)	Top-1 Accuracy49.1	2238
Image Classification	ImageNet-1K	Top-1 Acc53.19	1239
Image Classification	Stanford Cars	Accuracy30.6	660
Image Classification	Food-101	Accuracy76.2	570
Image Classification	CIFAR-10	Accuracy93.9	507
Image Classification	CUB-200 2011	Accuracy45.2	374
Image Classification	iNaturalist	Accuracy55.5	74
Image Classification	Downstream Datasets Average	Average Accuracy61	68
Image Classification	Flowers, CUB, Cars, CIFAR-10, CIFAR-100, Food, iNat downstream evaluation	Accuracy (Flowers)52.7	28

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord