Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings

About

Multi-turn, long-horizon tasks are increasingly common for large language models (LLMs), but solving them typically requires many sequential model invocations, accumulating substantial inference costs. Here, we study cost-aware multi-turn LLM routing: selecting which model to invoke at each turn from a model pool, given a fixed cost budget. We propose MTRouter, which encodes the interaction history and candidate models into joint history-model embeddings, and learns an outcome estimator from logged trajectories to predict turn-level model utility. Experiments show that MTRouter improves the performance-cost trade-off: on ScienceWorld, it surpasses GPT-5 while reducing total cost by 58.7%; on Humanity's Last Exam (HLE), it achieves competitive accuracy while reducing total cost by 43.4% relative to GPT-5, and these gains even carry over to held-out tasks. Further analyses reveal several mechanisms underlying its effectiveness: relative to prior multi-turn routers, MTRouter makes fewer model switches, is more tolerant to transient errors, and exhibits emergent specialization across models. Code: https://github.com/ZhangYiqun018/MTRouter

Yiqun Zhang, Hao Li, Zihan Wang, Shi Feng, Xiaocui Yang, Daling Wang, Bo Zhang, Lei Bai, Shuyue Hu• 2026

Related benchmarks

TaskDatasetResultRank
Interactive Decision-makingScienceWorld (test)
Score53.8
14
Interactive Decision-makingScienceWorld (OOD)
Score9.9
14
ReasoningHLE (test)
Accuracy26
14
ReasoningHLE OOD
Accuracy38.6
14
LLM RoutingLLM Routing Systems
Latency Overhead (ms)50
5
Showing 5 of 5 rows

Other info

Follow for update