MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs

About

The Mixture-of-Experts (MoE) architecture has become a predominant paradigm for scaling large language models (LLMs). Despite offering strong performance and computational efficiency, large MoE-based LLMs like DeepSeek-V3-0324 and Kimi-K2-Instruct present serious challenges due to substantial memory requirements in deployment. While recent works have explored MoE compression to address this issue, existing methods often suffer from considerable accuracy drops (e.g., 7-14% relatively) even at modest compression rates. This paper introduces a novel Mixture-of-Basis-Experts (MoBE) method that achieves model compression while incurring minimal accuracy drops. Specifically, each up/gate matrix in an expert is decomposed via a rank decomposition as W = AB, where matrix A is unique to each expert. The relatively larger matrix B is further re-parameterized as a linear combination of basis matrices {Bi} shared across all experts within a given MoE layer. The factorization is learned by minimizing the reconstruction error relative to the original weight matrices. Experiments demonstrate that MoBE achieves notably lower accuracy drops compared to prior works. For instance, MoBE can reduce the parameter counts of Qwen3-235B-A22B-2507, DeepSeek-V3-0324 (671B) and Kimi-K2-Instruct (1T) by 24%-30% with only 1%-2% accuracy drop (about 2% drops when measured relatively).

Xiaodong Chen, Mingming Ha, Zhenzhong Lan, Jing Zhang, Jianguo Li• 2025

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2	Perplexity (PPL)6.27	2320
Language Modeling	C4	Perplexity12.67	1565
Language Modeling	PTB	Perplexity12.17	1234
Math Reasoning	GSM8K	Accuracy (GSM8K)52.9	131
Code Generation	HumanEval	HumanEval Score41.3	128
Commonsense Reasoning	Commonsense 8 Sub-Tasks	Accuracy (8 Sub-Tasks)58.3	26
Machine Translation	Specialized Tasks Translation	Translation Quality Score30.3	23
Intent Classification	Specialized Tasks Intent	Intent Accuracy69.4	23
Machine Translation	Translation (test)	BLEU27	20
Math	GSM8K (test)	Mean@454.3	18

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord