| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| SoQA | KMEANS | Accuracy67.85 | 18 | 4d ago | |
| ASDiv | KMEANS | Accuracy96.74 | 18 | 4d ago | |
| GPQA | LOCUS | Accuracy79.42 | 18 | 4d ago | |
| GSM8k | KMEANS | Accuracy74.04 | 18 | 4d ago | |
| MMLU | IRT-NET | Accuracy65.83 | 18 | 4d ago | |
| TruthQA | LOCUS | Accuracy69.12 | 18 | 4d ago | |
| PIQA | KMEANS | Accuracy79.64 | 18 | 4d ago | |
| MedQA | KMEANS | Accuracy61.29 | 18 | 4d ago | |
| LogiQA | LOCUS | Accuracy67.75 | 18 | 4d ago | |
| MathQA | KMEANS | Accuracy66.15 | 18 | 4d ago | |
| Overall Combined Datasets | IRT-NET | Accuracy70.12 | 18 | 4d ago | |
| Model-Query Evaluation (112 language models, 10 public benchmarks) (test) | IRT-NET | Accuracy (Prediction)70.12 | 9 | 4d ago |