Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Can Heterogeneous Language Models Be Fused?

About

Model merging aims to integrate multiple expert models into a single model that inherits their complementary strengths without incurring the inference-time cost of ensembling. Recent progress has shown that merging can be highly effective when all source models are \emph{homogeneous}, i.e., derived from the same pretrained backbone and therefore share aligned parameter coordinates or compatible task vectors. Yet this assumption is increasingly unrealistic in open model ecosystems, where useful experts are often built on different families such as Llama, Qwen, and Mistral. In such \emph{heterogeneous} settings, direct weight-space fusion becomes ill-posed due to architectural mismatch, latent basis misalignment, and amplified cross-source conflict. We address this problem with \texttt{HeteroFusion} for heterogeneous language model fusion, which consists of two key components: topology-based alignment that transfers knowledge across heterogeneous backbones by matching functional module structures instead of raw tensor coordinates, and conflict-aware denoising that suppresses incompatible or noisy transfer signals during fusion. We further provide analytical justification showing that preserving the target adapter basis while predicting structured updates leads to a stable and well-conditioned transfer process. Across heterogeneous transfer, multi-source fusion, noisy-source robustness, and cross-family generalization settings, \texttt{HeteroFusion} consistently outperforms strong merging, fusion, and ensemble baselines.

Shilian Chen, Jie Zhou, Qin Chen, Wen Wu, Xin Li, Qi Feng, Liang He• 2026

Related benchmarks

TaskDatasetResultRank
Named Entity RecognitionMIT Movie
Entity F183.29
57
Relation ExtractionCONLL04
Relation Strict F163.95
52
Named Entity RecognitiontweetNER7
Entity F157.6
49
Relation ExtractionCoNLL 04
F163.32
39
Entity TypingFindVehicle
Precision86.71
32
Entity TypingFabNER
Precision77.51
32
Relation ExtractionNew York Times
Precision68.46
32
Aggregate PerformanceMit-Movie, TweetNER7, New York Times, CoNLL04, FindVehicle, and FabNER
Precision73.96
13
Multi-source Heterogeneous TransferAverage NER, RE, ET
Precision74.18
10
Showing 9 of 9 rows

Other info

Follow for update