Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

From Myopic Selection to Long-Horizon Awareness: Sequential LLM Routing for Multi-Turn Dialogue

About

Multi-turn dialogue is the predominant form of interaction with large language models (LLMs). While LLM routing is effective in single-turn settings, existing methods fail to maximize cumulative performance in multi-turn dialogue due to interaction dynamics and delayed rewards. To address this challenge, we move from myopic, single-turn selection to long-horizon sequential routing for multi-turn dialogue. Accordingly, we propose DialRouter, which first performs MCTS to explore dialogue branches induced by different LLM selections and collect trajectories with high cumulative rewards. DialRouter then learns a lightweight routing policy from search-derived data, augmented with retrieval-based future state approximation, enabling multi-turn routing without online search. Experiments on both open-domain and domain-specific dialogue tasks across diverse candidate sets of both open-source and closed-source LLMs demonstrate that DialRouter significantly outperforms single LLMs and existing routing baselines in task success rate, while achieving a superior performance-cost trade-off when combined with a cost-aware reward.

Jiarui Zhang, Xiangyu Liu, Yong Hu, Chaoyue Niu, Hang Zeng, Shaojie Tang, Fan Wu, Guihai Chen• 2026

Related benchmarks

TaskDatasetResultRank
Multi-turn dialogueShareGPT
Success Rate (SR)87.46
24
Multi-turn dialogueJDDC
Success Rate (SR)83.45
24
Multi-turn dialogueMedDG
Success Rate (SR)79.03
24
Multi-turn dialogueShareGPT, JDDC, and MedDG Aggregated
SRavg83.31
24
Multi-turn dialogue routingShareGPT-LF Qwen Series Set cross-domain (legal and financial)
Success Rate (SR)83.69
9
Multi-turn dialogue routingShareGPT-LF Llama Series Set (cross-domain (legal and financial))
Success Rate (SR)83.03
9
Multi-turn dialogue routingShareGPT Mixed candidate set (Qwen and Llama)
SR88.32
6
Showing 7 of 7 rows

Other info

Follow for update