| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TriviaQA (test) | SE | AUROC82.12 | 78 | 3d ago | |
| JudgeBench (test) | Energy | AUROC71.53 | 77 | 3d ago | |
| CoQA (test) | SAR | AUROC77.3 | 42 | 3d ago | |
| TriviaQA | SE + MARS | AUROC83.63 | 37 | 3d ago | |
| SciQA | SAR-s | AUROC0.7572 | 36 | 3d ago | |
| WebQA | SE + MARS | AUROC73.57 | 30 | 3d ago | |
| NaturalQA | SE + MARS | AUROC75.5 | 30 | 3d ago | |
| CIFAR10-C Benign Stream | SICL | ECE6.4 | 16 | 3d ago | |
| PopQA | EigV+MCH | A183.09 | 16 | 3d ago | |
| NQ-open | EigV+MCH | A1 Accuracy88.67 | 16 | 3d ago | |
| Solar 1Y (test) | EnbPI | $Δ$ Cov-0.002 | 8 | 3d ago | |
| JSRT (test) | Correlation0.89 | 8 | 3d ago | ||
| HMC-QU (test) | CRISP | Correlation Coefficient0.41 | 8 | 3d ago | |
| CAMUS (test) | CRISP-MC | Correlation0.78 | 8 | 3d ago | |
| SVAMP | BSDETECTOR | AUROC93.6 | 7 | 3d ago | |
| CSQA | BSDETECTOR | AUROC0.769 | 7 | 3d ago | |
| GSM8K | BSDETECTOR | AUROC0.951 | 7 | 3d ago | |
| Synthetic a=1 10-fold (test) | DiffPO | 95% PI0.908 | 6 | 3d ago | |
| Synthetic a=0 10-fold (test) | DiffPO | 95% PI Coverage98.1 | 6 | 3d ago | |
| NYUD2-DIR Few-shot | VIR | NLL4.113 | 5 | 3d ago | |
| NYUD2-DIR Medium-shot | VIR | NLL2.727 | 5 | 3d ago | |
| NYUD2-DIR Many-shot | VIR | NLL2.815 | 5 | 3d ago | |
| NYUD2-DIR All-shot | VIR | NLL3.866 | 5 | 3d ago | |
| STS-B DIR Few | VIR | NLL2.152 | 5 | 3d ago | |
| STS-B-DIR Medium | VIR | NLL2.754 | 5 | 3d ago |