Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MoE Pathfinder: Trajectory-driven Expert Pruning

About

Mixture-of-experts (MoE) architectures used in large language models (LLMs) achieve state-of-the-art performance across diverse tasks yet face practical challenges such as deployment complexity and low activation efficiency. Expert pruning has thus emerged as a promising solution to reduce computational overhead and simplify the deployment of MoE models. However, existing expert pruning approaches conventionally rely on local importance metrics and often apply uniform layer-wise pruning, leveraging only partial evaluation signals and overlooking the heterogeneous contributions of experts across layers. To address these limitations, we propose an expert pruning approach based on the trajectory of activated experts across layers, which treats MoE as a weighted computation graph and casts expert selection as a global optimal path planning problem. Within this framework, we integrate complementary importance signals from reconstruction error, routing probabilities, and activation strength at the trajectory level, which naturally yields non-uniform expert retention across layers. Experiments show that our approach achieves superior pruning performance on nearly all tasks compared with most existing approaches.

Xican Yang, Yuanhe Tian, Yan Song• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy60.42
1460
Commonsense ReasoningWinoGrande
Accuracy74.19
776
Language UnderstandingMMLU
Accuracy57.66
756
Mathematical ReasoningGSM8K
Accuracy (GSM8K)38.67
358
Multitask Language UnderstandingMMLU
Accuracy57.66
206
Question AnsweringARC
Accuracy49.66
154
Medical Question AnsweringMedQA
Accuracy49.25
109
Science Question AnsweringARC
ARC Accuracy53.18
23
Showing 8 of 8 rows

Other info

Follow for update