| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TriviaQA | PCNET | AUROC0.98 | 621 | 17h ago | |
| HotpotQA | PR | AUROC0.928 | 249 | 6d ago | |
| TriviaQA (test) | MultiHaluDet | AUC-ROC98.3 | 243 | 20h ago | |
| TruthfulQA | ARS (CCS) | AUC (ROC)0.9417 | 178 | 14d ago | |
| HaluEval (test) | MultiHaluDet | AUC-ROC98.55 | 176 | 20h ago | |
| NQ | Max Pooling | AUC0.889 | 154 | 18d ago | |
| HaluEval | Stacking | AUROC1 | 131 | 17h ago | |
| GSM8K | ARS (CCS) | AUROC90.37 | 115 | 5d ago | |
| TruthfulQA (test) | SDES | AUC-ROC89.5 | 112 | 26d ago | |
| CoQA | Curvature | AUROC84.92 | 108 | 23h ago | |
| CoQA | SpikeScore | Mean AUROC0.8584 | 107 | 6d ago | |
| CSQA | OSCAR | AUROC85.1 | 107 | 2mo ago | |
| BioASQ | AUROC81.13 | 104 | 5d ago | ||
| RAGTruth (test) | TPA | AUROC0.9096 | 99 | 22d ago | |
| PopQA | PR | AUC96.18 | 97 | 17h ago | |
| TruthfulQA | CausalGaze | AUROC0.8851 | 91 | 23h ago | |
| NQ (test) | CDES | AUC ROC95.2 | 91 | 26d ago | |
| HELM Passage Level v1.0 (test) | MIND | AUC0.9599 | 84 | 3mo ago | |
| HELM Sentence Level v1.0 (test) | MIND | AUC0.8835 | 84 | 3mo ago | |
| SQuAD | ID | AUROC0.89 | 82 | 5d ago | |
| HaluBench | F: Ans. Expect. | AUROC97 | 75 | 14d ago | |
| Math | SpikeScore | Mean AUROC81.57 | 72 | 3mo ago | |
| Company | Latent Debate Detector | AUC-ROC0.93 | 68 | 3mo ago | |
| NQ-Open | DRIFT | AUROC0.8843 | 63 | 5d ago | |
| MMLU | Lyapunov Probes | AUPRC87.48 | 62 | 27d ago |