Dr.LLM: Dynamic Layer Routing in LLMs

About

Large Language Models (LLMs) process every token through all layers of a transformer stack, causing wasted computation on simple queries and insufficient flexibility for harder ones that need deeper reasoning. Adaptive-depth methods can improve efficiency, but prior approaches rely on costly inference-time search, architectural changes, or large-scale retraining, and in practice often degrade accuracy despite efficiency gains. We introduce Dr. LLM, Dynamic routing of Layers for LLMs, a retrofittable framework that equips pretrained models with lightweight per-layer routers deciding to skip, execute, or repeat a block. Routers are trained with explicit supervision: using Monte Carlo Tree Search (MCTS), we derive high-quality layer configurations that preserve or improve accuracy under a compute budget. Our design, windowed pooling for stable routing, focal loss with class balancing, and bottleneck MLP routers, ensures robustness under class imbalance and long sequences. On ARC (logic) and DART (math), Dr. LLM improves accuracy by up to +3.4%p while saving 5 layers per example on average. Routers generalize to out-of-domain tasks (MMLU, GSM8k, AIME, TruthfulQA, SQuADv2, GPQA, PIQA, AGIEval) with only 0.85% accuracy drop while retaining efficiency, and outperform prior routing methods by up to +7.7%p. Overall, Dr. LLM shows that explicitly supervised routers retrofit frozen LLMs for budget-aware, accuracy-driven inference without altering base weights. Code is available at https://github.com/parameterlab/dr-llm.

Ahmed Heakl, Martin Gubri, Salman Khan, Sangdoo Yun, Seong Joon Oh• 2025

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	PIQA	Accuracy79.2	400
Mathematical Reasoning	ASDIV	Accuracy0.591	280
Mathematical Reasoning	MAWPS	Accuracy41.3	279
Reasoning	ARC	Accuracy94.5	269
Multitask Language Understanding	MMLU	--	263
Factuality	TruthfulQA	Accuracy47.9	145
Mathematical Reasoning	GSM8K	Accuracy75.7	80
Mathematical Reasoning	DART-Math DM-1	Pass@k Accuracy53.6	71
Mathematical Reasoning	DART-Math DM-4	Pass@k Accuracy33.4	64
Mathematical Reasoning	DART-Math DM-5	Pass@k Accuracy21.6	64

Showing 10 of 22 rows

Other info

GitHub

Follow for update

@wizwand_team Discord