Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Transport and Merge: Cross-Architecture Merging for Large Language Models

About

Large language models (LLMs) achieve strong capabilities by scaling model capacity and training data, yet many real-world deployments rely on smaller models trained or adapted from low-resource data. This gap motivates the need for mechanisms to transfer knowledge from large, high-resource models to smaller, low-resource targets. While model merging provides an effective transfer mechanism, most existing approaches assume architecture-compatible models and therefore cannot directly transfer knowledge from large high-resource LLMs to heterogeneous low-resource targets. In this work, we propose a cross-architecture merging framework based on optimal transport (OT) that aligns activations to infer cross-neuron correspondences between heterogeneous models. The resulting transport plans are then used to guide direct weight-space fusion, enabling effective high-resource to low-resource transfer using only a small set of inputs. Extensive experiments across low-resource languages and specialized domains demonstrate consistent improvements over target models.

Chenhang Cui, Binyun Yang, Fei Shen, Yuxin Chen, Jingnan Zheng, Xiang Wang, An Zhang, Tat-Seng Chua• 2026

Related benchmarks

TaskDatasetResultRank
Language UnderstandingMMLU
Medicine Accuracy40.5
17
Machine Reading ComprehensionBELEBELE Indonesian
Accuracy (Target Language)36.9
13
Mathematical ReasoningMGSM Thai
Score72
5
Causal ReasoningXCOPA Thai
Accuracy60
3
Commonsense ReasoningXCOPA Indonesian
Accuracy60
3
Language UnderstandingCMMLU Cantonese
Accuracy (Humanities)27.72
3
Language UnderstandingMMLU Thai
Score17
3
Language UnderstandingMalayMMLU
Humanities Score48.81
3
Multiple-choice Question AnsweringTruthfulQA-MC Indonesian
Accuracy36.6
3
Multiple-choice Question AnsweringMMLU financial subsets (test)
MMLU Business Ethics Acc38
3
Showing 10 of 11 rows

Other info

Follow for update