| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TriviaQA | AUROC0.852 | 45 | 18d ago | ||
| GSM8K | Direction | AUROC60.1 | 18 | 1mo ago | |
| Medals | Direction | AUROC77 | 18 | 1mo ago | |
| Math operations | Verb. conf. | AUROC0.913 | 18 | 1mo ago | |
| Cities | Direction | AUROC88 | 18 | 1mo ago | |
| Notable People | Direction | AUROC82.5 | 18 | 1mo ago | |
| SoQA | KMEANS | Accuracy67.85 | 18 | 1mo ago | |
| ASDiv | KMEANS | Accuracy96.74 | 18 | 1mo ago | |
| GPQA | LOCUS | Accuracy79.42 | 18 | 1mo ago | |
| GSM8k | KMEANS | Accuracy74.04 | 18 | 1mo ago | |
| MMLU | IRT-NET | Accuracy65.83 | 18 | 1mo ago | |
| TruthQA | LOCUS | Accuracy69.12 | 18 | 1mo ago | |
| PIQA | KMEANS | Accuracy79.64 | 18 | 1mo ago | |
| MedQA | KMEANS | Accuracy61.29 | 18 | 1mo ago | |
| LogiQA | LOCUS | Accuracy67.75 | 18 | 1mo ago | |
| MathQA | KMEANS | Accuracy66.15 | 18 | 1mo ago | |
| Overall Combined Datasets | IRT-NET | Accuracy70.12 | 18 | 1mo ago | |
| Model-Query Evaluation (112 language models, 10 public benchmarks) (test) | IRT-NET | Accuracy (Prediction)70.12 | 9 | 1mo ago | |
| GSM8K (test) | Trajectory features | Best-layer ROC-AUC0.852 | 3 | 11d ago |