Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale

About

Mixture-of-Experts (MoE) architectures are evolving towards finer granularity to improve parameter efficiency. However, existing MoE designs face an inherent trade-off between the granularity of expert specialization and hardware execution efficiency. We propose OmniMoE, a system-algorithm co-designed framework that pushes expert granularity to its logical extreme. OmniMoE introduces vector-level Atomic Experts, enabling scalable routing and execution within a single MoE layer, while retaining a shared dense MLP branch for general-purpose processing. Although this atomic design maximizes capacity, it poses severe challenges for routing complexity and memory access. To address these, OmniMoE adopts a system-algorithm co-design: (i) a Cartesian Product Router that decomposes the massive index space to reduce routing complexity from O(N) to O(sqrt(N)); and (ii) Expert-Centric Scheduling that inverts the execution order to turn scattered, memory-bound lookups into efficient dense matrix operations. Validated on seven benchmarks, OmniMoE (with 1.7B active parameters) achieves 50.9% zero-shot accuracy across seven benchmarks, outperforming coarse-grained (e.g., DeepSeekMoE) and fine-grained (e.g., PEER) baselines. Crucially, OmniMoE reduces inference latency from 73ms to 6.7ms (a 10.9-fold speedup) compared to PEER, demonstrating that massive-scale fine-grained MoE can be fast and accurate. Our code is open-sourced at https://github.com/flash-algo/omni-moe.

Jingze Shi, Zhangyang Peng, Yizhang Zhu, Yifan Wu, Guang Liu, Yuyu Luo• 2026

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy60.9
1460
Commonsense ReasoningWinoGrande
Accuracy59.7
776
Language UnderstandingMMLU
Accuracy37.5
756
Commonsense ReasoningPIQA
Accuracy78.7
647
Question AnsweringOBQA
Accuracy40.3
276
Question AnsweringARC
Accuracy61
154
Question AnsweringTriviaQA
Accuracy18.5
85
Showing 7 of 7 rows

Other info

GitHub

Follow for update