| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TriviaQA | SAPLMA | AUROC0.95 | 265 | 3d ago | |
| TriviaQA (test) | MLP Probe | AUC-ROC92.23 | 169 | 3d ago | |
| HaluEval (test) | CDES | AUC-ROC97.1 | 126 | 3d ago | |
| HotpotQA | PR | AUROC0.928 | 118 | 3d ago | |
| NQ | PR | AUC0.8645 | 102 | 3d ago | |
| TruthfulQA (test) | SDES | AUC-ROC89.5 | 91 | 3d ago | |
| PopQA | PR | AUC96.18 | 88 | 3d ago | |
| NQ (test) | CDES | AUC ROC95.2 | 84 | 3d ago | |
| HELM Passage Level v1.0 (test) | MIND | AUC0.9599 | 84 | 3d ago | |
| HELM Sentence Level v1.0 (test) | MIND | AUC0.8835 | 84 | 3d ago | |
| RAGTruth (test) | TPA | AUROC0.9096 | 83 | 3d ago | |
| Math | SpikeScore | Mean AUROC81.57 | 72 | 3d ago | |
| Company | Latent Debate Detector | AUC-ROC0.93 | 68 | 3d ago | |
| CSQA | fDBD | AUROC72.47 | 55 | 3d ago | |
| GSM8K | ARS (CCS) | AUROC90.37 | 53 | 3d ago | |
| GSM8K (test) | HALLUGUARD | AUROC (Reference)79.01 | 48 | 3d ago | |
| SQuAD (test) | HALLUGUARD | AUROCr83.8 | 48 | 3d ago | |
| Average Cross-domain | SpikeScore | Mean AUROC0.7874 | 48 | 3d ago | |
| SVAMP | SpikeScore | Mean AUROC78.37 | 48 | 3d ago | |
| CoQA | SpikeScore | Mean AUROC0.8584 | 48 | 3d ago | |
| Belebele | SpikeScore | Mean AUROC0.7719 | 48 | 3d ago | |
| CommonsenseQA | SpikeScore | Mean AUROC0.7563 | 48 | 3d ago | |
| TruthfulQA | ARS (CCS) | AUC (ROC)0.9417 | 47 | 3d ago | |
| CoQA | LSC | AUCs77.5 | 42 | 3d ago | |
| VQA-Med 2019 (All) | VASE + V-Loop | AUC76.1 | 39 | 3d ago |