Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning

About

Despite the significant breakthrough of Mixture-of-Experts (MoE), the increasing scale of these MoE models presents huge memory and storage challenges. Existing MoE pruning methods, which involve reducing parameter size with a uniform sparsity across all layers, often lead to suboptimal outcomes and performance degradation due to varying expert redundancy in different MoE layers. To address this, we propose a non-uniform pruning strategy, dubbed \textbf{Di}fferentiable \textbf{E}xpert \textbf{P}runing (\textbf{DiEP}), which adaptively adjusts pruning rates at the layer level while jointly learning inter-layer importance, effectively capturing the varying redundancy across different MoE layers. By transforming the global discrete search space into a continuous one, our method handles exponentially growing non-uniform expert combinations, enabling adaptive gradient-based pruning. Extensive experiments on five advanced MoE models demonstrate the efficacy of our method across various NLP tasks. Notably, \textbf{DiEP} retains around 92\% of original performance on Mixtral 8$\times$7B with only half the experts, outperforming other pruning methods by up to 7.1\% on the challenging MMLU dataset.

Sikai Bai, Haoxi Li, Jie Zhang, Zicong Hong, Song Guo• 2025

Related benchmarks

TaskDatasetResultRank
Video UnderstandingMVBench
Accuracy66.82
247
Video UnderstandingVideoMME--
192
Chart UnderstandingChartQA
Accuracy82.79
83
Visual Question AnsweringTextVQA
Accuracy87.43
69
Video UnderstandingEgoSchema
Accuracy58.74
49
Image UnderstandingMME
Score2.08e+3
39
Multi-modal UnderstandingMMVet
Accuracy68.13
35
Image UnderstandingImage Understanding Suite (TextVQA, ChartQA, MMStar, MMBench, MMVet, MME, RealWorldQA, COCO)
TextVQA Score82.04
34
Video UnderstandingVideo Understanding Suite MVBench, EgoSchema, VMME, LVB, VMMMU
MVBench Score63.15
34
Real-world Visual UnderstandingRealworldQA
Accuracy62.56
24
Showing 10 of 17 rows

Other info

Follow for update