Routing-Free Mixture-of-Experts

About

Standard Mixture-of-Experts (MoE) models rely on centralized routing mechanisms that introduce rigid inductive biases. We propose Routing-Free MoE which eliminates any hard-coded centralized designs including external routers, Softmax, Top-K and load balancing, instead encapsulating all activation functionalities within individual experts and directly optimized through continuous gradient flow, enabling each expert to determine its activation entirely on its own. We introduce a unified adaptive load-balancing framework to simultaneously optimize both expert-balancing and token-balancing objectives through a configurable interpolation, allowing flexible and customizable resource allocation. Extensive experiments show that Routing-Free MoE can consistently outperform baselines with better scalability and robustness. We analyze its behavior in detail and offer insights that may facilitate future MoE design ad optimization.

Yilun Liu, Jinru Han, Sikuan Yan, Volker Tresp, Yunpu Ma• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	--	1896
Commonsense Reasoning	WinoGrande	Accuracy50.59	1442
Question Answering	ARC Challenge	--	906
Language Modeling	OpenWebText	Perplexity19.97	122
Question Answering	OpenBookQA	Normalized Accuracy26.6	102
Question Answering	ARC Easy	Normalized Accuracy37.46	55
Commonsense Reasoning	PIQA	Normalized Accuracy58.92	41
Natural Language Understanding	GLUE	QQP Accuracy39.93	8

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord