Transport and Merge: Cross-Architecture Merging for Large Language Models
About
Large language models (LLMs) achieve strong capabilities by scaling model capacity and training data, yet many real-world deployments rely on smaller models trained or adapted from low-resource data. This gap motivates the need for mechanisms to transfer knowledge from large, high-resource models to smaller, low-resource targets. While model merging provides an effective transfer mechanism, most existing approaches assume architecture-compatible models and therefore cannot directly transfer knowledge from large high-resource LLMs to heterogeneous low-resource targets. In this work, we propose a cross-architecture merging framework based on optimal transport (OT) that aligns activations to infer cross-neuron correspondences between heterogeneous models. The resulting transport plans are then used to guide direct weight-space fusion, enabling effective high-resource to low-resource transfer using only a small set of inputs. Extensive experiments across low-resource languages and specialized domains demonstrate consistent improvements over target models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Understanding | MMLU | Medicine Accuracy40.5 | 17 | |
| Machine Reading Comprehension | BELEBELE Indonesian | Accuracy (Target Language)36.9 | 13 | |
| Mathematical Reasoning | MGSM Thai | Score72 | 5 | |
| Causal Reasoning | XCOPA Thai | Accuracy60 | 3 | |
| Commonsense Reasoning | XCOPA Indonesian | Accuracy60 | 3 | |
| Language Understanding | CMMLU Cantonese | Accuracy (Humanities)27.72 | 3 | |
| Language Understanding | MMLU Thai | Score17 | 3 | |
| Language Understanding | MalayMMLU | Humanities Score48.81 | 3 | |
| Multiple-choice Question Answering | TruthfulQA-MC Indonesian | Accuracy36.6 | 3 | |
| Multiple-choice Question Answering | MMLU financial subsets (test) | MMLU Business Ethics Acc38 | 3 |