| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TriviaQA (test) | PrisonBreak | Accuracy73 | 29 | 1mo ago | |
| LSQA OOD | UALIGN | Precision79.56 | 24 | 1mo ago | |
| ID Datasets Average | UALIGN | Precision70.82 | 24 | 1mo ago | |
| NQ-Open ID | ICL-COT | Precision57.34 | 24 | 1mo ago | |
| SciQ (ID) | UALIGN | Precision76.44 | 24 | 1mo ago | |
| TVQA ID | UALIGN | Precision82.1 | 24 | 1mo ago | |
| Factual Category Average (test) | Accuracy31.38 | 18 | 1mo ago | ||
| SimpleQA (test) | HEART | Accuracy79.07 | 10 | 1mo ago | |
| TruthfulQA | B1 entropy | AUROC59.9 | 4 | 23d ago | |
| TruthfulQA and HotpotQA | C-GAN | Hallucination Rate19.7 | 3 | 12d ago |