Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

XPERT: Expert Knowledge Transfer for Effective Training of Language Models

About

Mixture-of-Experts (MoE) language models organize knowledge into explicitly routed expert modules, making expert-level representations traceable and analyzable. By analyzing expert activation patterns in MoE large language models (LLMs), we find that a subset of experts is consistently activated across diverse knowledge domains. These common experts encode cross-domain, generalizable knowledge that is closely related to model generalization, naturally raising the question of how such identifiable expert knowledge can be practically reused. Motivated by this observation, we propose XPERT, a framework that extracts, consolidates, and reuses expert knowledge from pre-trained MoE LLMs to support more effective training of language models across different model scales. XPERT identifies cross-domain experts via inference-only analysis, refines their representations through tensor decomposition, and adapts the extracted knowledge to reuse in downstream models. Experiments on language understanding and dialogue generation benchmarks show that models benefiting from reused expert knowledge achieve consistently stronger performance and faster convergence compared to strong baselines. These results highlight MoE LLMs as structured and reusable knowledge sources, and demonstrate the value of expert-level knowledge reuse for improving model training.

Chang Liu, Boyu Shi, Xu Yang, Xin Geng• 2026

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande
Accuracy50.75
1442
Medical Question AnsweringMedMCQA
Accuracy35.5
521
Reading ComprehensionBoolQ
Accuracy (BoolQ)73.91
228
Commonsense ReasoningPIQA
Accuracy55.44
213
Dialogue GenerationDollyEval
ROUGE-L24.19
16
Dialogue GenerationS-NI
Rouge-L19.82
16
Dialogue GenerationUnNI
Rouge-L23.2
16
Dialogue GenerationSelfInst
Rouge-L11.31
16
Dialogue GenerationVicuna
Rouge-L15.05
16
Legal ReasoningCaseHold
Accuracy (CaseHold)83.13
16
Showing 10 of 10 rows

Other info

Follow for update