| Model Routing Suite MathQA, LogiQA, MedQA, PIQA, TruthQA, MMLU, GSM8k, GPQA, ASDiv, SoQA | KMeans | Overall Accuracy66.2 | | 18 | 1mo ago |
| NB-Curated OOD | | Uniqueness35.4 | | 11 | 15d ago |
| NB-WildChat | | Uniqueness Score42.6 | | 11 | 15d ago |
| OOD Set AIME Humanity's Last Exam SimpleQA OlympiadBench (test) | SCOPE | Avg. A50.8 | | 11 | 1mo ago |
| SCOPE-60K 5% split (test) | SCOPE | Avg. Accuracy75 | | 11 | 1mo ago |
| Model-Query Evaluation (test) | LOCUS | Routing Accuracy (%)64.7 | | 9 | 1mo ago |
| Global Routing Dataset Mixed Pool | SharedTrunkNet | P-AUCCC0.2323 | | 7 | 25d ago |
| Global Routing Dataset Small Pool | SharedTrunkNet | P-AUCCC0.5472 | | 7 | 25d ago |
| Global Routing Dataset Frontier Pool | SharedTrunkNet | P-AUCCC0.4377 | | 7 | 25d ago |
| Small Pool | SharedTrunkNet | Oracle Accuracy92 | | 6 | 25d ago |
| Frontier Pool | SharedTrunkNet | Oracle Accuracy89.3 | | 6 | 25d ago |
| Mixed pool | SharedTrunkNet | Mean per-model AUC0.8817 | | 6 | 25d ago |
| Small pool | SharedTrunkNet | Mean per-model AUC82.6 | | 6 | 25d ago |
| Frontier pool | SharedTrunkNet | Mean per-model AUC0.856 | | 6 | 25d ago |
| FRAMES (ID) | | CPT (80%)60.92 | | 4 | 1mo ago |
| MATH 500 (ID) | | CPT (80%)53.79 | | 4 | 1mo ago |
| AIME (ID) | | CPT (80%)74.45 | | 4 | 1mo ago |
| LSAT (ID) | | CPT (80%)60.9 | | 4 | 1mo ago |
| MMLU-Pro (ID) | | CPT (80%)68.24 | | 4 | 1mo ago |
| MMLU-Redux (ID) | | CPT (80%)52.93 | | 4 | 1mo ago |
| MMLU (ID) | | CPT (80%)52.61 | | 4 | 1mo ago |
| GPQA Diamond (ID) | | CPT (80%)62.69 | | 4 | 1mo ago |
| FRAMES (ID queries) | | CPT (85%) Score69.41 | | 4 | 1mo ago |
| MATH-500 (ID queries) | | CPT (85%)64.97 | | 4 | 1mo ago |
| AIME ID queries | | CPT (85%)80.84 | | 4 | 1mo ago |