Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RACER: Risk-Aware Calibrated Efficient Routing for Large Language Models

About

Efficiently routing queries to the optimal large language model (LLM) is crucial for optimizing the cost-performance trade-off in multi-model systems. However, most existing routers rely on single-model selection, making them susceptible to misrouting. In this work, we formulate LLM routing as the $\alpha$-VOR problem to minimize expected set size while controlling the misrouting risk, and propose a novel method -- RACER, extending base routers to output model sets that can be subsequently aggregated for improved output. In particular, RACER constructs nested model sets via augmented scoring and utilizes finite-sample concentration bounds to calibrate a threshold that allows for both variable set sizes and abstention. We theoretically prove that RACER achieves rigorous distribution-free risk control on unseen test data in a post-hoc and model-agnostic manner. Extensive experiments verify our theoretical guarantees and demonstrate that RACER consistently enhances downstream accuracy across a wide range of benchmarks.

Sai Hao, Hao Zeng, Hongxin Wei, Bingyi Jing• 2026

Related benchmarks

TaskDatasetResultRank
Question AnsweringARC Challenge
Accuracy56.8
906
Multitask Language UnderstandingMMLU
Accuracy63.7
413
Multitask Language UnderstandingC-MMLU
Accuracy (C-MMLU)50.7
16
Showing 3 of 3 rows

Other info

Follow for update