RouteLMT: Learned Sample Routing for Hybrid LLM Translation Deployment
About
Large Language Models (LLMs) have achieved remarkable performance in Machine Translation (MT), but deploying them at scale remains prohibitively expensive. A widely adopted remedy is the hybrid system paradigm, which balances cost and quality by serving most requests with a small model and selectively routing a fraction to a large model. However, existing routing strategies often rely on heuristics, external predictors, or absolute quality estimation, which fail to capture whether the large model actually provides a worthwhile improvement over the small one. In this paper, we formulate routing as a budget allocation problem and identify marginal gain, i.e., the large model's improvement over the small model, as the optimal signal for budgeted decisions. Building on this, we propose \textbf{RouteLMT} (routing for LLM-based MT), an efficient in-model router that predicts this expected gain by probing the small translators prompt-token representation, without requiring external models or hypothesis decoding. Extensive experiments demonstrate that our RouteLMT outperforms heuristics, quality/difficulty estimation baselines, achieving a superior quality-budget Pareto frontier. Furthermore, we analyze regression risks and show that a simple guarded variant can mitigate severe quality losses.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Budgeted Hybrid Routing | Medical En→Zh | HitRate@p47.66 | 12 | |
| Budgeted Hybrid Routing | Medical En→Ru | Hit Rate@p54.72 | 12 | |
| Budgeted Hybrid Routing | Medical Zh→En | HitRate@p53.03 | 12 | |
| Budgeted Hybrid Routing | Medical Ru→En | HitRate@p51.57 | 12 | |
| Budgeted Hybrid Routing | Medical Average Global | Spearman Correlation0.3 | 12 | |
| Budgeted Hybrid Routing | Colloquial domain (test) | Spearman Correlation (Avg.)0.3 | 12 | |
| Translation Routing | En-Zh | Hit Rate@p56.24 | 12 | |
| Translation Routing | En-Ru | Hit Rate@p60.5 | 12 | |
| Translation Routing | Zh-En | HitRate@p0.5564 | 12 | |
| Translation Routing | Ru-En | HitRate@p56.93 | 12 |