Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation
About
Large language model (LLM) multi-agent systems typically rely on rigid orchestration, committing either to flat per-query routing or to hand-engineered task decomposition, so decomposition depth, worker choice, and inference budget are not jointly optimized under one objective. We introduce Uno-Orchestra, a unified orchestration policy that selectively decomposes a task and dispatches each subtask to an admissible (model, primitive) pair, with both decisions learned together from curated RL trajectories grounded in real worker interactions. Against 22 baselines on a 13-benchmark suite spanning math, code, knowledge, long-context, and agentic tool-use, Uno-Orchestra reaches 77.0% macro pass@1, roughly 16% above the strongest workflow baseline, at roughly an order of magnitude lower per-query cost, advancing the accuracy-efficiency frontier of selective delegation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | MBPP | Pass@192.4 | 211 | |
| Code Generation | HumanEval | pass@193.1 | 145 | |
| General AI Assistant Tasks | GAIA | Pass@1 Score82 | 38 | |
| Agentic Tool-use | Agentic Macro-aggregate | Pass@170.3 | 22 | |
| Code and Software Engineering | Code/SE Macro-aggregate | Pass@177.8 | 22 | |
| Knowledge retrieval | Knowledge Macro-aggregate | Pass@180.5 | 22 | |
| Math problem solving | Math Macro-aggregate | Pass@179.2 | 22 | |
| Reading Comprehension | Reading Macro-aggregate | Pass@179.7 | 22 | |
| Mathematical Reasoning | AIME | Pass@166.5 | 16 | |
| Software Engineering | SWE-Bench | Resolve Rate81.8 | 16 |