Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference

About

Semantic routers in LLM inference gateways select tools in the critical request path, where every millisecond of added latency compounds across millions of requests. We propose Outcome-Aware Tool Selection (OATS), which interpolates tool embeddings toward the centroid of queries where they historically succeed -- an offline process that adds no parameters, latency, or GPU cost at serving time. On MetaTool (199~tools, 4,287~queries), this improves NDCG@5 from 0.869 to 0.940; on ToolBench (2,413~APIs), from 0.834 to 0.848. We also evaluate two learned extensions: a 2,625-parameter MLP re-ranker and a 197K-parameter contrastive adapter. The MLP re-ranker hurts or matches baseline when outcome data is sparse relative to the tool set; the contrastive adapter provides comparable gains on MetaTool (NDCG@5: 0.931). All methods are evaluated on the same held-out 30\% test split. The practical takeaway is to start with the zero-cost refinement and add learned components only when data density warrants it. All mechanisms run within single-digit millisecond CPU budgets.

Huamin Chen, Xunzhuo Liu, Junchen Jiang, Bowei He, Xue Liu• 2026

Related benchmarks

Task	Dataset	Result
Tool selection	MetaTool similar choices subtask (test)	Accuracy83.4	8
Tool selection	MetaTool 199 tools, 1,287 queries (30% test)	R@183	7
Tool selection	ToolBench 30% 2,413 tools, 180 queries (test)	Recall@138.7	7

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord