Model Fusion via Retrofitting
About
Model fusion seeks to combine independently trained neural networks into a single model without retraining, but is complicated by representational divergence arising from permutation invariance, random initialization, and heterogeneous training data. Existing methods struggle particularly in zero-shot settings under non-IID data distributions, and are often limited to specific architectures or pairwise fusion. We introduce a neuron-centric family of fusion algorithms that frames fusion as a principled representation-matching problem: intermediate neurons across parent models are grouped into target representations, which the fused model's corresponding sub-networks are then trained to approximate. Unlike prior work, our approach incorporates neuron attribution scores to bias alignment toward salient features, and can be applied to any architecture modularizable as a DAG of levels -- empirically validated on VGGs, ResNets, and ViTs. Experiments across standard benchmarks show consistent improvements over existing fusion methods, with the largest gains in zero-shot and non-IID scenarios. Code is available at https://github.com/AndrewSpano/model-fusion-via-retrofitting.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | Tiny ImageNet (test) | Accuracy54.2 | 722 | |
| Image Classification | CIFAR-100 (test) | Accuracy75.6 | 63 | |
| Image Classification | CIFAR-10 non-IID s=2 | Test Accuracy86.6 | 33 | |
| Image Classification | CIFAR-10 (2-way sharded split) | Accuracy80.9 | 24 | |
| Image Classification | CIFAR-100 6-way Sharded (test) | Test Accuracy34.7 | 19 | |
| Image Classification | CIFAR10 Non-IID (4-Class partition) | Accuracy79.7 | 19 | |
| Image Classification | CIFAR-100 2-way Sharded (test) | Test Accuracy54.5 | 18 | |
| Image Classification | Tiny-ImageNet (Sharded 2-way split) | Accuracy32.5 | 18 | |
| Image Classification | CIFAR-100 4-way Sharded (test) | Accuracy41.4 | 17 | |
| Image Classification | CIFAR-10 4-way sharded | Accuracy56.4 | 14 |