Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

EvoRoute: Experience-Driven Self-Routing LLM Agent Systems

About

Complex agentic AI systems, powered by a coordinated ensemble of Large Language Models (LLMs), tool and memory modules, have demonstrated remarkable capabilities on intricate, multi-turn tasks. However, this success is shadowed by prohibitive economic costs and severe latency, exposing a critical, yet underexplored, trade-off. We formalize this challenge as the \textbf{Agent System Trilemma}: the inherent tension among achieving state-of-the-art performance, minimizing monetary cost, and ensuring rapid task completion. To dismantle this trilemma, we introduce EvoRoute, a self-evolving model routing paradigm that transcends static, pre-defined model assignments. Leveraging an ever-expanding knowledge base of prior experience, EvoRoute dynamically selects Pareto-optimal LLM backbones at each step, balancing accuracy, efficiency, and resource use, while continually refining its own selection policy through environment feedback. Experiments on challenging agentic benchmarks such as GAIA and BrowseComp+ demonstrate that EvoRoute, when integrated into off-the-shelf agentic systems, not only sustains or enhances system performance but also reduces execution cost by up to $80\%$ and latency by over $70\%$.

Guibin Zhang, Haiyang Yu, Kaiming Yang, Bingli Wu, Fei Huang, Yongbin Li, Shuicheng Yan• 2026

Related benchmarks

TaskDatasetResultRank
Multi-task EvaluationAggregate All tasks (summary)
Score74.6
20
General AI Assistant TasksGAIA All levels original (test)
Performance (%)63.19
15
General AI Assistant TasksGAIA Level 1 original (test)
Performance (%)83.02
15
General AI Assistant TasksGAIA Level 2 original (test)
Perf (%)59.3
15
Web Browsing and Tool UseBrowseComp+ original (test)
Performance (%)38.72
15
General AI Assistant TasksGAIA Level 3 original (test)
Performance33.33
15
Data ScienceDS-1000
Performance Score56.5
8
Medical ReasoningDDXPlus
Performance Score79.5
8
Web SearchHotpotQA
Performance Score87.8
8
Showing 9 of 9 rows

Other info

Follow for update