| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Short-form QA (Average of NQ, PopQA, TriviaQA, SimpleQA) (test) | FRANQ condition-calibrated | PR-AUC71.1 | 60 | 1mo ago | |
| TriviaQA-generated Fact Dataset topic-wise leave-one-out strategy 1.0 | SAPLMA | Accuracy (billturnbull)63.9 | 20 | 3mo ago | |
| TriviaQA (test) | DynHD | AUROC85.5 | 8 | 2mo ago | |
| CSQA (test) | TraceDet | AUROC74.8 | 8 | 2mo ago | |
| HotpotQA (test) | DynHD | AUROC73.3 | 8 | 2mo ago | |
| Multiple TriviaQA, HotpotQA, CSQA | DynHD | Average AUROC72.9 | 4 | 2mo ago |