Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns

About

Mixture-of-Experts (MoE) effectively scales model capacity while preserving computational efficiency through sparse expert activation. However, training high-quality MoEs from scratch is prohibitively expensive. A promising alternative is to convert pretrained dense models into sparse MoEs. Existing dense-to-MoE methods fall into two categories: \textbf{dynamic structural pruning} that converts dense models into MoE architectures with moderate sparsity to balance performance and inference efficiency, and \textbf{downcycling} approaches that use pretrained dense models to initialize highly sparse MoE architectures. However, existing methods break the intrinsic activation patterns within dense models, leading to suboptimal expert construction. In this work, we argue that the Gated Linear Unit (GLU) mechanism provides a natural blueprint for dense-to-MoE conversion. We show that the fine-grained neural-wise activation patterns of GLU reveal a coarse-grained structure, uncovering an inherent MoE architecture composed of consistently activated universal neurons and dynamically activated specialized neurons. Leveraging this discovery, we introduce ExpertWeaver, a training-free framework that partitions neurons according to their activation patterns and constructs shared experts and specialized routed experts with layer-adaptive configurations. Our experiments demonstrate that ExpertWeaver significantly outperforms existing methods, both as a training-free dynamic structural pruning technique and as a downcycling strategy for superior MoE initialization.

Ziyu Zhao, Tong Zhu, Zhi Zhang, Tiantian Fan, Jinluan Yang, Kun Kuang, Zhongyu Wei, Fei Wu, Yu Cheng• 2026

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande
Accuracy65.3
1442
Code GenerationHumanEval (test)--
612
Physical Interaction Question AnsweringPIQA
Accuracy78
415
Mathematical ReasoningGSM8K
Accuracy34.9
388
Science Question AnsweringARC Easy
Accuracy72.4
162
Language UnderstandingMMLU 5-shot--
153
Language UnderstandingMMLU 5-shot (test)--
149
Science Question AnsweringSciQ
Normalized Accuracy91.8
137
Logical reasoningLogiQA
Accuracy29
100
Instruction FollowingIFEval (test)
IFEval Score33.1
88
Showing 10 of 17 rows

Other info

Follow for update