Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RouterBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Test-Time Scaling SelectionRouterBench Quality-Priority
Reward0.7924
16
Test-Time Scaling SelectionRouterBench Cost-Sensitive
Reward0.8337
16
Predictive LLM RoutingRouterBench
AUC91.91
14
LLM RoutingRouterBench
QNC1.66
14
Text-based LLM RoutingRouterBench
Utility Score55.58
12
RoutingRouterBench (test)
Accuracy91.4
11
Ranking quality gain estimationRouterBench
Ranking Quality Gain (%)31.05
9
LLM RoutingRouterBench Out-of-domain
nAUC75.6
9
Predictive Model RoutingRouterBench Quality-Priority
Reward0.6121
8
Predictive Model RoutingRouterBench Cost-Sensitive
Reward0.6226
8
Aggregate Model EvaluationRouterBench subsampled 2500 s
Accuracy79.1
8
LLM RoutingRouterBench held-out (test)
Accuracy91.3
6
RoutingRouterBench
Accuracy-
0
Showing 13 of 13 rows