| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Fact Verification | FEVER | Accuracy53.9 | 67 | |
| Fact Verification | FEVER (dev) | Label Accuracy82.1 | 57 | |
| Model Editing | FEVER | Efficacy98.23 | 49 | |
| Fact Verification | FEVER (val) | True Deferral-Advice Loss0.555 | 48 | |
| Fact Verification | FEVER (test) | LA Score79.47 | 32 | |
| Fact Verification | FEVER 1.0 (dev) | Label Accuracy89.07 | 23 | |
| Information Retrieval | FEVER BEIR | nDCG0.948 | 22 | |
| Fact Verification | FEVER | EM61.1 | 18 | |
| Fact Verification | FEVER | F1 Score53.9 | 18 | |
| Fact Extraction and Verification | FEVER (test) | Label Accuracy (LA)75.96 | 18 | |
| Explanation Evaluation | FEVER (test) | Sufficiency9.72 | 16 | |
| Fact Verification | FEVER-Symmetric | Precision88 | 16 | |
| Fact Verification (Adversarial Claim Rewriting) | FEVER | ASR2.63 | 15 | |
| Fact-checking | FEVER | F1 Macro94.3 | 14 | |
| Fact Verification | FEVER 1.0 (test) | Label Accuracy74.07 | 14 | |
| Classification | FEVER Symmetric v2 1.0 | Accuracy69.1 | 13 | |
| Classification | FEVER v1 (ID) | Accuracy87.5 | 13 | |
| Fact-Checking | FEVER | Balanced Accuracy91.9 | 12 | |
| Ad Hoc Retrieval | FEVER | NDCG@1085.5 | 12 | |
| Fact Verification | FEVER-S | Accuracy54 | 12 | |
| Fact Verification | FEVER | Accuracy61.4 | 12 | |
| Fact-verification | FEVER | Accuracy73.73 | 11 | |
| Fact Verification | FEVER (test) | Accuracy99.7 | 10 | |
| Sentence-Level Confidence Prediction | FEVER | AUROC0.7 | 10 | |
| global fact consistency verification | FEVER | Precision99.5 | 10 |