| Average of 6 datasets | Dissimilarity + beamsearch | PRR65 | | 120 | 3mo ago |
| Aggregated Experimental Datasets (XSum, SamSum, CNN, WMT19, MedQUAD, TruthfulQA, CoQA, SciQ, TriviaQA, MMLU, GSM8k) (test) | TAD | Mean Rank1 | | 88 | 1mo ago |
| SVHN (test) | SR | Accuracy100 | | 80 | 1mo ago |
| OKVQA, VizWiz, VQA-RAD, and AdVQA (Average) | VIG-TUQ | AUROC0.691 | | 70 | 7d ago |
| Musique 500 randomly sampled queries (test) | R2C | AUROC0.8322 | | 70 | 3mo ago |
| HotpotQA 500 randomly sampled queries (test) | R2C | AUROC83.25 | | 70 | 3mo ago |
| PopQA 500 randomly sampled queries (test) | R2C | AUROC0.8709 | | 70 | 3mo ago |
| Aggregate All 11 Datasets | TAD | Mean PRR55.7 | | 44 | 1mo ago |
| InfoSeek (val) | PE | Accuracy14.7 | | 42 | 5d ago |
| Encyclopedic VQA (EVQA) (test) | PE | Accuracy16.9 | | 42 | 5d ago |
| ImageNet 10 | lowrank-KFAC | NLL0.266 | | 42 | 1mo ago |
| CIFAR10 | lowrank-KFAC | NLL0.256 | | 42 | 1mo ago |
| FashionMNIST | lowrank-KFAC | NLL0.248 | | 42 | 1mo ago |
| Vision Datasets averaged (test) | SGPU | AUROC81.7 | | 36 | 3mo ago |
| LongFact | Ecc | PCC-0.017 | | 32 | 1mo ago |
| BIO | Ecc | PCC-0.129 | | 32 | 1mo ago |
| MulFactTrap (test) | RUfact | ROC AUC0.898 | | 32 | 3mo ago |
| Mixed Dataset (real and fake biographies) | RUgen | ROC AUC0.9001 | | 32 | 3mo ago |
| MAQA ∆K−1 | Structure-Aware Minimum Bayes Risk Decoding | KL Divergence AUC0.757 | | 28 | 3mo ago |
| CNN/DailyMail | Structure-Aware Minimum Bayes Risk Decoding | Hamming AUC0.745 | | 28 | 3mo ago |
| WMT 19 | KLE | COMET AUC0.608 | | 28 | 3mo ago |
| MAQA | Structure-Aware Minimum Bayes Risk Decoding | Hamming AUC83.5 | | 28 | 3mo ago |
| SciQ (test) | SENTSAR | AUROC74.5 | | 28 | 3mo ago |
| MMLU-pro (test) | CAGE-CAL | AUROC77.74 | | 24 | 2d ago |
| MSD Task01 (test) | ACQR | Coverage (%)94.22 | | 24 | 2mo ago |