Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM-as-a-Judge Routing on 3 datasets Average (test)
Loading...
90
Accuracy
RACER
68.576
74.138
79.7
85.262
May 11, 2026
Accuracy
Cost
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
Cost
RACER
Model Scale=8B, Budget...
2026.05
90
3.9
M-IRT
Model Scale=8B, Budget...
2026.05
88.9
3.4
RouteLLM-MF
Model Scale=8B, Budget...
2026.05
88.2
4.1
RouterBench-KNN
Model Scale=8B, Budget...
2026.05
86.8
2.6
RACER
Model Scale=4B, Budget...
2026.05
85.8
3.4
RouteLLM-MF
Model Scale=4B, Budget...
2026.05
84.7
3.4
M-IRT
Model Scale=4B, Budget...
2026.05
84.3
2.7
RouterBench-KNN
Model Scale=4B, Budget...
2026.05
84.1
2.5
RACER
Model Scale=1.7B, Budg...
2026.05
72.2
3.6
M-IRT
Model Scale=1.7B, Budg...
2026.05
71.6
3.4
RouterBench-KNN
Model Scale=1.7B, Budg...
2026.05
71.3
2.6
RouteLLM-MF
Model Scale=1.7B, Budg...
2026.05
69.4
3.8
Feedback
Search any
task
Search any
task