| Heterogeneous four-agent system Gemma, Phi, Qwen (test) | | Accuracy94 | | 27 | 1mo ago |
| CEVAL and GSM8K (OOD) | NIRT-Router | Performance87.46 | | 21 | 1mo ago |
| MMLU, CMMLU, etc. In-distribution | NIRT-Router | Performance80.69 | | 21 | 1mo ago |
| Cold-start | | Cost (Cost-first)0.0226 | | 15 | 5d ago |
| Average across Benchmarks (val) | | Avg Top-1 Acc83 | | 14 | 1mo ago |
| BBEH (val) | | Top-1 Acc66.4 | | 14 | 1mo ago |
| MEDMCQA (val) | | Top-1 Acc96.3 | | 14 | 1mo ago |
| SUPERGPQA (val) | | Top-1 Acc0.776 | | 14 | 1mo ago |
| MMLU-PRO (val) | | Top-1 Acc91.5 | | 14 | 1mo ago |
| BBEH | | Top-1 Accuracy66.4 | | 14 | 1mo ago |
| MedMCQA | | Top-1 Acc96.3 | | 14 | 1mo ago |
| SUPERGPQA | | Top-1 Acc77.6 | | 14 | 1mo ago |
| MMLU-PRO | | Top-1 Acc91.5 | | 14 | 1mo ago |
| MMLU-PRO, SUPERGPQA, MEDMCQA, BBEH (test) | | MMLU-PRO Top-1 Acc91.5 | | 14 | 1mo ago |
| In-domain datasets Cost First, alpha=0.8 | | Accuracy93 | | 11 | 1mo ago |
| In-domain datasets Balance, alpha=0.5 | | Accuracy93 | | 11 | 1mo ago |
| In-domain datasets Performance First, alpha=0.2 | | Accuracy93 | | 11 | 1mo ago |
| OOD | | Accuracy89 | | 11 | 1mo ago |
| OOD datasets (test) | | Accuracy89 | | 11 | 1mo ago |
| MMR-Bench | EquiRouter | nAUC0.7059 | | 11 | 1mo ago |
| RouterBench | EquiRouter | nAUC0.7712 | | 11 | 1mo ago |
| MMR-Bench Out-of-domain | EquiRouter | nAUC0.6701 | | 9 | 1mo ago |
| RouterBench Out-of-domain | EquiRouter | nAUC75.6 | | 9 | 1mo ago |
| LLM Routing In-domain (test) | FrugalGPT | Cost (Cost-first)0.023 | | 8 | 5d ago |
| GPT series models Out of Domain | | Accuracy82 | | 8 | 1mo ago |