| MMR-Bench | | nAUC0.918 | | 37 | 21d ago |
| Heterogeneous four-agent system Gemma, Phi, Qwen (test) | | Accuracy94 | | 27 | 3mo ago |
| CEVAL and GSM8K (OOD) | NIRT-Router | Performance87.46 | | 21 | 3mo ago |
| MMLU, CMMLU, etc. In-distribution | NIRT-Router | Performance80.69 | | 21 | 3mo ago |
| Cold-start | | Cost (Cost-first)0.0226 | | 15 | 1mo ago |
| Agentic Evaluation (test) | | Accuracy89.65 | | 14 | 23d ago |
| Average across Benchmarks (val) | | Avg Top-1 Acc83 | | 14 | 3mo ago |
| BBEH (val) | | Top-1 Acc66.4 | | 14 | 3mo ago |
| MEDMCQA (val) | | Top-1 Acc96.3 | | 14 | 3mo ago |
| SUPERGPQA (val) | | Top-1 Acc0.776 | | 14 | 3mo ago |
| MMLU-PRO (val) | | Top-1 Acc91.5 | | 14 | 3mo ago |
| BBEH | | Top-1 Accuracy66.4 | | 14 | 3mo ago |
| MedMCQA | | Top-1 Acc96.3 | | 14 | 3mo ago |
| SUPERGPQA | | Top-1 Acc77.6 | | 14 | 3mo ago |
| MMLU-PRO | | Top-1 Acc91.5 | | 14 | 3mo ago |
| MMLU-PRO, SUPERGPQA, MEDMCQA, BBEH (test) | | MMLU-PRO Top-1 Acc91.5 | | 14 | 3mo ago |
| RouterBench | CSCR | QNC1.66 | | 14 | 1mo ago |
| In-domain datasets Cost First, alpha=0.8 | | Accuracy93 | | 11 | 3mo ago |
| In-domain datasets Balance, alpha=0.5 | | Accuracy93 | | 11 | 3mo ago |
| In-domain datasets Performance First, alpha=0.2 | | Accuracy93 | | 11 | 3mo ago |
| OOD | | Accuracy89 | | 11 | 3mo ago |
| OOD datasets (test) | | Accuracy89 | | 11 | 3mo ago |
| RouterArena (Evaluation set) | | Arena S Score80.72 | | 9 | 2d ago |
| Mixed-domain ShareGPT, WildChat, Chatbot Arena 10k episodes (test) | | Average Cost914 | | 9 | 8d ago |
| Six datasets Average | | Macro Accuracy75.9 | | 9 | 21d ago |