| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TriviaQA | SAPLMA | AUROC0.95 | 438 | 4d ago | |
| TriviaQA (test) | HARP | AUC-ROC92.9 | 183 | 16d ago | |
| HotpotQA | PR | AUROC0.928 | 163 | 10d ago | |
| HaluEval (test) | CDES | AUC-ROC97.1 | 126 | 1mo ago | |
| CSQA | OSCAR | AUROC85.1 | 107 | 15d ago | |
| TruthfulQA (test) | SDES | AUC-ROC89.5 | 105 | 16d ago | |
| TruthfulQA | ARS (CCS) | AUC (ROC)0.9417 | 102 | 4d ago | |
| NQ | PR | AUC0.8645 | 102 | 1mo ago | |
| CoQA | SpikeScore | Mean AUROC0.8584 | 100 | 24d ago | |
| GSM8K | ARS (CCS) | AUROC90.37 | 93 | 4d ago | |
| PopQA | PR | AUC96.18 | 88 | 1mo ago | |
| NQ (test) | CDES | AUC ROC95.2 | 84 | 1mo ago | |
| HELM Passage Level v1.0 (test) | MIND | AUC0.9599 | 84 | 1mo ago | |
| HELM Sentence Level v1.0 (test) | MIND | AUC0.8835 | 84 | 1mo ago | |
| RAGTruth (test) | TPA | AUROC0.9096 | 83 | 1mo ago | |
| HaluEval | CausalGaze | F1 Score83.6 | 75 | 4d ago | |
| Math | SpikeScore | Mean AUROC81.57 | 72 | 1mo ago | |
| Company | Latent Debate Detector | AUC-ROC0.93 | 68 | 1mo ago | |
| NQ-Open | DRIFT | AUROC0.8843 | 61 | 4d ago | |
| GSM8K (test) | HALLUGUARD | AUROC (Reference)79.01 | 48 | 1mo ago | |
| SQuAD (test) | HALLUGUARD | AUROCr83.8 | 48 | 1mo ago | |
| Average Cross-domain | SpikeScore | Mean AUROC0.7874 | 48 | 1mo ago | |
| SVAMP | SpikeScore | Mean AUROC78.37 | 48 | 1mo ago | |
| Belebele | SpikeScore | Mean AUROC0.7719 | 48 | 1mo ago | |
| CommonsenseQA | SpikeScore | Mean AUROC0.7563 | 48 | 1mo ago |