Routing by Analogy: kNN-Augmented Expert Assignment for Mixture-of-Experts

About

Mixture-of-Experts (MoE) architectures scale large language models efficiently by employing a parametric ``router'' to dispatch tokens to a sparse subset of experts. Typically, this router is trained once and then frozen, rendering routing decisions brittle under distribution shifts. We address this limitation by introducing kNN-MoE, a retrieval-augmented routing framework that reuses locally optimal expert assignments from a memory of similar past cases. This memory is constructed offline by directly optimizing token-wise routing logits to maximize the likelihood on a reference set. Crucially, we use the average similarity of retrieved neighbors as a confidence-driven mixing coefficient, thus allowing the method to fall back to the frozen router when no relevant cases are found. Experiments show that kNN-MoE outperforms the zero-shot baseline and is competitive with computationally intensive supervised fine-tuning.

Boxuan Lyu, Soichiro Murakami, Hidetaka Kamigaito, Peinan Zhang• 2026

Related benchmarks

Task	Dataset	Result
Language Understanding	MMLU	Accuracy47.81	844
Question Answering	GPQA	Accuracy29.8	258
Medical Question Answering	MedMCQA (test)	Accuracy66.65	134
Question Answering	MedQA-USMLE (test)	Accuracy76.7	101
Question Answering	GPQA (test)	Accuracy45.45	65
Question Answering	MMLU (test)	Accuracy78.86	25
Question Answering	SuperGPQA (test)	Accuracy35.15	15
Language Understanding	USMLE	Accuracy35.04	3

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord