Model Fusion via Optimal Transport

About

Combining different models is a widely used paradigm in machine learning applications. While the most common approach is to form an ensemble of models and average their individual predictions, this approach is often rendered infeasible by given resource constraints in terms of memory and computation, which grow linearly with the number of models. We present a layer-wise model fusion algorithm for neural networks that utilizes optimal transport to (soft-) align neurons across the models before averaging their associated parameters. We show that this can successfully yield "one-shot" knowledge transfer (i.e, without requiring any retraining) between neural networks trained on heterogeneous non-i.i.d. data. In both i.i.d. and non-i.i.d. settings , we illustrate that our approach significantly outperforms vanilla averaging, as well as how it can serve as an efficient replacement for the ensemble with moderate fine-tuning, for standard convolutional networks (like VGG11), residual networks (like ResNet18), and multi-layer perceptrons on CIFAR10, CIFAR100, and MNIST. Finally, our approach also provides a principled way to combine the parameters of neural networks with different widths, and we explore its application for model compression. The code is available at the following link, https://github.com/sidak/otfusion.

Sidak Pal Singh, Martin Jaggi• 2019

Related benchmarks

Task	Dataset	Result
Few-shot classification	MiniImagenet	5-way 5-shot Accuracy27.22	98
Few-shot classification	CUB	--	96
Few-shot classification	CIFAR-FS	Accuracy (5-way 1-shot)29.1	78
Image Classification	CIFAR-10 non-IID s=2	Test Accuracy49.6	33
Image Classification	CIFAR-10 (2-way sharded split)	Accuracy28.1	24
Image Classification	CIFAR10 Non-IID (4-Class partition)	Accuracy14.8	19
Image Classification	CIFAR-10 8-way Non-IID	Test Accuracy13.4	14
Image Classification	CIFAR-10 4-way sharded	Accuracy11.8	14
Image Classification	CIFAR-10 (6-way sharded)	Accuracy10	14
Node Classification Retention	Cora, CiteSeer, Actor, Amazon-Ratings, and Arxiv specialist subsets (held-out)	Retention A92	10

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord