Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity

About

Large language models (LLMs) increasingly adopt Mixture-of-Experts (MoE) architectures to scale model capacity while reducing computation. Fine-tuning these MoE-based LLMs often requires access to distributed and privacy-sensitive data, making centralized fine-tuning impractical. Federated learning (FL) therefore provides a paradigm to collaboratively fine-tune MoE-based LLMs, enabling each client to integrate diverse knowledge without compromising data privacy. However, the integration of MoE-based LLM fine-tuning into FL encounters two critical aggregation challenges due to inherent data heterogeneity across clients: (i) divergent local data distributions drive clients to develop distinct gating preference for localized expert selection, causing direct parameter aggregation to produce a ``one-size-fits-none'' global gating network, and (ii) same-indexed experts develop disparate semantic roles across clients, leading to expert semantic blurring and the degradation of expert specialization. To address these challenges, we propose FedAlign-MoE, a federated aggregation alignment framework that jointly enforces routing consistency and expert semantic alignment. Specifically, FedAlign-MoE aggregates gating behaviors by aligning routing distributions through consistency weighting and optimizes local gating networks through distribution regularization, maintaining cross-client stability without overriding discriminative local preferences. Meanwhile, FedAlign-MoE explicitly quantifies semantic consistency among same-indexed experts across clients and selectively aggregates updates from semantically aligned clients, ensuring stable and specialized functional roles for global experts. Extensive experiments demonstrate that FedAlign-MoE outperforms state-of-the-art benchmarks, achieving faster convergence and superior accuracy in non-IID federated environments.

Zihan Fang, Qianru Wang, Haonan An, Zheng Lin, Yiqin Deng, Xianhao Chen, Yuguang Fang• 2026

Related benchmarks

TaskDatasetResultRank
General Knowledge EvaluationMMLU
MMLU Accuracy51.31
45
Multi-class classificationAGNews IID
Accuracy94.24
14
Commonsense ReasoningPIQA IID distribution
Accuracy81.15
10
Commonsense ReasoningHellaSwag IID distribution
Accuracy77.82
10
Commonsense ReasoningPIQA non-IID distribution, alpha=0.1
Accuracy74.1
10
Commonsense ReasoningHellaSwag non-IID distribution, alpha=0.1
Accuracy58.47
10
General Knowledge EvaluationMMLU non-IID distribution, alpha=0.1
Accuracy39.79
10
Topic ClassificationAGNews non-IID distribution, alpha=0.1
Accuracy85.22
10
Showing 8 of 8 rows

Other info

Follow for update