| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TriviaQA | AUROC0.999 | 113 | 27d ago | ||
| GSM8K | Qwen 2.5 | AUROC80.7 | 33 | 23d ago | |
| PIQA | GRIFFIN | Accuracy80.69 | 28 | 20d ago | |
| Medals | Direction | AUROC77 | 18 | 3mo ago | |
| Math operations | Verb. conf. | AUROC0.913 | 18 | 3mo ago | |
| Cities | Direction | AUROC88 | 18 | 3mo ago | |
| Notable People | Direction | AUROC82.5 | 18 | 3mo ago | |
| SoQA | KMEANS | Accuracy67.85 | 18 | 3mo ago | |
| ASDiv | KMEANS | Accuracy96.74 | 18 | 3mo ago | |
| GPQA | LOCUS | Accuracy79.42 | 18 | 3mo ago | |
| GSM8k | KMEANS | Accuracy74.04 | 18 | 3mo ago | |
| MMLU | IRT-NET | Accuracy65.83 | 18 | 3mo ago | |
| TruthQA | LOCUS | Accuracy69.12 | 18 | 3mo ago | |
| MedQA | KMEANS | Accuracy61.29 | 18 | 3mo ago | |
| LogiQA | LOCUS | Accuracy67.75 | 18 | 3mo ago | |
| MathQA | KMEANS | Accuracy66.15 | 18 | 3mo ago | |
| Overall Combined Datasets | IRT-NET | Accuracy70.12 | 18 | 3mo ago | |
| ProntoQA | Llama 3.1 | AUROC79.9 | 15 | 23d ago | |
| Global Pooled Datasets | Op-XGB | WP-AUC0.723 | 12 | 5d ago | |
| MMLU Pro | Op-XGB | WP-AUC0.639 | 12 | 5d ago | |
| MATH | OST | WP-AUC0.662 | 12 | 5d ago | |
| LiveCodeBench | Op-XGB | WP-AUC79.5 | 12 | 5d ago | |
| GPQA | Op-XGB | WP-AUC0.703 | 12 | 5d ago | |
| ARC Challenge | OST | WP-AUC64.5 | 12 | 5d ago | |
| AIME | Op-XGB | WP-AUC0.838 | 12 | 5d ago |