Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity

About

Large language models (LLMs) increasingly adopt Mixture-of-Experts (MoE) architectures to scale model capacity while reducing computation. Fine-tuning these MoE-based LLMs often requires access to distributed and privacy-sensitive data, making centralized fine-tuning impractical. Federated learning (FL) therefore provides a paradigm to collaboratively fine-tune MoE-based LLMs, enabling each client to integrate diverse knowledge without compromising data privacy. However, the integration of MoE-based LLM fine-tuning into FL encounters two critical aggregation challenges due to inherent data heterogeneity across clients: (i) divergent local data distributions drive clients to develop distinct gating preference for localized expert selection, causing direct parameter aggregation to produce a ``one-size-fits-none'' global gating network, and (ii) same-indexed experts develop disparate semantic roles across clients, leading to expert semantic blurring and the degradation of expert specialization. To address these challenges, we propose FedAlign-MoE, a federated aggregation alignment framework that jointly enforces routing consistency and expert semantic alignment. Specifically, FedAlign-MoE aggregates gating behaviors by aligning routing distributions through consistency weighting and optimizes local gating networks through distribution regularization, maintaining cross-client stability without overriding discriminative local preferences. Meanwhile, FedAlign-MoE explicitly quantifies semantic consistency among same-indexed experts across clients and selectively aggregates updates from semantically aligned clients, ensuring stable and specialized functional roles for global experts. Extensive experiments demonstrate that FedAlign-MoE outperforms state-of-the-art benchmarks, achieving faster convergence and superior accuracy in non-IID federated environments.

Zihan Fang, Qianru Wang, Haonan An, Zheng Lin, Yiqin Deng, Xianhao Chen, Yuguang Fang• 2026

Related benchmarks

Task	Dataset	Result
General Knowledge Evaluation	MMLU	MMLU Accuracy51.31	127
Multi-class classification	AGNews IID	Accuracy94.24	14
Commonsense Reasoning	PIQA IID distribution	Accuracy81.15	10
Commonsense Reasoning	HellaSwag IID distribution	Accuracy77.82	10
Commonsense Reasoning	PIQA non-IID distribution, alpha=0.1	Accuracy74.1	10
Commonsense Reasoning	HellaSwag non-IID distribution, alpha=0.1	Accuracy58.47	10
General Knowledge Evaluation	MMLU non-IID distribution, alpha=0.1	Accuracy39.79	10
Topic Classification	AGNews non-IID distribution, alpha=0.1	Accuracy85.22	10

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord