Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs

About

Mixture-of-Experts Large Language Models (MoE-LLMs) achieve strong performance but incur substantial memory overhead due to massive expert parameters. Mixed-precision quantization mitigates this cost by allocating expert-wise bit-widths based on their importance, approaching the accuracy-memory Pareto frontier and enabling extreme low-bit quantization. However, existing methods rely on layer-wise importance estimation and overlook router shifts induced by quantization, resulting in suboptimal allocation and routing. In this work, we propose Global Expert-level Mixed-precision Quantization (GEMQ) to overcome these limitations via (1) a global linear-programming formulation that captures model-wide expert importance based on quantization error analysis, and (2) efficient router fine-tuning to adapt routing to quantized experts. These components are integrated into a progressive quantization framework that iteratively refines importance estimation and allocation. Experiments demonstrate that GEMQ significantly reduces memory and accelerates inference with minimal accuracy degradation. Source code is available at https://github.com/jndeng/GEMQ .

Jianing Deng, Song Wang, Dongwei Wang, Zijie Liu, Tianlong Chen, Huanrui Yang, Jingtong Hu• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity4.37
3785
Language ModelingWikiText-2 (test)
PPL4.37
2333
Language ModelingC4 (test)
Perplexity8.06
464
Multi-task Language UnderstandingMMLU
MMLU Accuracy64.63
442
Mathematical ReasoningMathQA
Accuracy38.63
354
Question AnsweringBoolQ
Accuracy85.02
201
Language UnderstandingMMLU 5-shot--
153
Zero-shot ReasoningZeroShot 7
Accuracy65.69
56
Commonsense ReasoningReasoning Suite Zero-shot Aggregate
Aggregate Score59.49
50
Zero-shot Reasoning and General KnowledgeEvaluation Suite Zero-shot (PIQA, ARC-Easy, ARC-Challenge, HellaSwag, WinoGrande, MathQA, MMLU)
PIQA (PQ) Accuracy81.07
40
Showing 10 of 13 rows

Other info

Follow for update