Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization

About

Mixture-of-Experts (MoE) based large language models (LLMs) offer strong performance but suffer from high memory and computation costs. Weight binarization provides extreme efficiency, yet existing binary methods designed for dense LLMs struggle with MoE-specific issues, including cross-expert redundancy, task-agnostic importance estimation, and quantization-induced routing shifts. To this end, we propose MoBiE, the first binarization framework tailored for MoE-based LLMs. MoBiE is built on three core innovations: 1. using joint SVD decomposition to reduce cross-expert redundancy; 2. integrating global loss gradients into local Hessian metrics to enhance weight importance estimation; 3. introducing an error constraint guided by the input null space to mitigate routing distortion. Notably, MoBiE achieves these optimizations while incurring no additional storage overhead, striking a balance between efficiency and model performance. Extensive experiments demonstrate that MoBiE consistently outperforms state-of-the-art binary methods across multiple MoE-based LLMs and benchmarks. For example, on Qwen3-30B-A3B, MoBiE reduces perplexity by 52.2$\%$, improves average zero-shot performance by 43.4$\%$, achieves over 2 $\times$ inference speedup, and further shortens quantization time. The code is available at https://github.com/Kishon-zzx/MoBiE.

Zhixiong Zhao, Zukang Xu, Zhixuan Chen, Dawei Yang• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity12.8
2839
Question AnsweringARC Easy--
597
Question AnsweringPIQA
Accuracy76.42
374
Mathematical ReasoningMathQA
Accuracy30.25
305
Sentence CompletionHellaSwag
Accuracy65.41
276
Multiple-choice Question AnsweringARC Easy
Accuracy69.57
188
Question AnsweringARC Challenge
Accuracy (ARC)43.17
142
ReasoningWinoGrande (WG)
Accuracy65.9
135
Language ModelingLambada OpenAI
Accuracy52.01
127
Code GenerationHumanEval
HumanEval Score27.12
93
Showing 10 of 17 rows

Other info

Follow for update