Task-Aware LLM Routing with Multi-Level Task-Profile-Guided Data Synthesis for Cold-Start Scenarios
About
Large language models (LLMs) exhibit substantial variability in performance and computational cost across tasks and queries, motivating routing systems that select models to meet user-specific cost-performance trade-offs. However, existing routers generalize poorly in cold-start scenarios where in-domain training data is unavailable. We address this limitation with a multi-level task-profile-guided data synthesis framework that constructs a hierarchical task taxonomy and produces diverse question-answer pairs to approximate the test-time query distribution. Building on this, we introduce TRouter, a task-type-aware router approach that models query-conditioned cost and performance via latent task-type variables, with prior regularization derived from the synthesized task taxonomy. This design enhances TRouter's routing utility under both cold-start and in-domain settings. Across multiple benchmarks, we show that our synthesis framework alleviates cold-start issues and that TRouter delivers effective LLM routing.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | MBPP | Pass@169.8 | 211 | |
| Code Generation | HumanEval | pass@172.4 | 145 | |
| General AI Assistant Tasks | GAIA | Pass@1 Score17.2 | 38 | |
| Knowledge retrieval | Knowledge Macro-aggregate | Pass@156.3 | 22 | |
| Reading Comprehension | Reading Macro-aggregate | Pass@146 | 22 | |
| Math problem solving | Math Macro-aggregate | Pass@138.6 | 22 | |
| Agentic Tool-use | Agentic Macro-aggregate | Pass@123.8 | 22 | |
| Code and Software Engineering | Code/SE Macro-aggregate | Pass@143.9 | 22 | |
| Mathematical Reasoning | AIME | Pass@116.4 | 16 | |
| Language Model Routing and Orchestration | Blind pool | Average Precision (p@1)42.41 | 16 |