| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| GSM8k | Sheeps | ROC-AUC88.5 | 66 | 1mo ago | |
| MMLU | TAD | ROC-AUC0.947 | 66 | 1mo ago | |
| TriviaQA | Mean Token Entropy | ROC-AUC87.4 | 66 | 1mo ago | |
| SciQ | SATRMD | ROC-AUC86.1 | 66 | 1mo ago | |
| CoQA | TAD | ROC-AUC74.7 | 66 | 1mo ago | |
| TruthfulQA | TAD | ROC-AUC0.744 | 66 | 1mo ago | |
| MedQUAD | MIND | ROC-AUC0.928 | 66 | 1mo ago | |
| WMT19 | LookBackLens | ROC-AUC0.845 | 66 | 1mo ago | |
| CNN | TAD | ROC-AUC75.8 | 66 | 1mo ago | |
| SamSum | LookBackLens | ROC-AUC82.1 | 66 | 1mo ago | |
| XSum | TAD | ROC-AUC85.9 | 66 | 1mo ago | |
| XSum, SamSum, CNN, WMT19, MedQUAD, TruthfulQA, CoQA, SciQ, TriviaQA, MMLU, GSM8k Aggregate | TAD | Mean ROC-AUC0.814 | 22 | 1mo ago | |
| Aggregate All Datasets | TAD | Mean PRR56.3 | 22 | 1mo ago | |
| MMLU | SATRMD+MSP | PRR Accuracy81.6 | 14 | 1mo ago | |
| GSM8k | SATRMD+MSP | PRR (Accuracy)64.2 | 14 | 1mo ago | |
| TriviaQA | DegMat NLI Score Entail. | PRR (AlignScore)0.714 | 14 | 1mo ago | |
| SciQ | HUQ-SATRMD | PRR (AlignScore)65.3 | 14 | 1mo ago | |
| CoQA | PRR (AlignScore)47.2 | 14 | 1mo ago | ||
| TruthfulQA | SATRMD+MSP | PRR (AlignScore)35.3 | 14 | 1mo ago | |
| MedQUAD | SATRMD+MSP | PRR (ROUGE-L)46.6 | 14 | 1mo ago | |
| PubMedQA | SATRMD+MSP | PRR (ROUGE-L)0.372 | 14 | 1mo ago | |
| CNN | Perplexity | PRR (ROUGE-L)0.15 | 14 | 1mo ago | |
| SamSum | HUQ-SATRMD | PRR (ROUGE-L)48.6 | 14 | 1mo ago | |
| MMLU Out-of-domain | HUQ-SATRMD | PRR (Accuracy)0.77 | 8 | 1mo ago | |
| SciQ Out-of-domain | HUQ-SATRMD | PRR (AlignScore)64.4 | 8 | 1mo ago |