| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Fact Verification | FEVER | Accuracy53.9 | 72 | |
| End-to-End Defense in RAG | FEVER | ASR0 | 63 | |
| Fact Verification | FEVER (dev) | Label Accuracy82.1 | 57 | |
| Model Editing | FEVER | Efficacy98.23 | 49 | |
| Fact Verification | FEVER (val) | True Deferral-Advice Loss0.555 | 48 | |
| Complex reasoning | FEVER (test) | Macro F185.18 | 37 | |
| Model Editing | FEVER 20K edits (test) | Efficacy99.07 | 36 | |
| Feature Attribution | FEVER | Comprehensiveness0.75 | 33 | |
| Fact Verification | FEVER (test) | LA Score79.47 | 32 | |
| Fact verification | FEVER | Accuracy87.45 | 30 | |
| Lifelong Model Editing | FEVER | Efficacy98.38 | 27 | |
| Fact Verification | FEVER 1.0 (dev) | Label Accuracy89.07 | 23 | |
| Information Retrieval | FEVER BEIR | nDCG0.948 | 22 | |
| Claim Correction | FEVER Retrieved evidence | SARI (%)50.7141 | 21 | |
| Passage Reranking | FEVER BEIR | NDCG@1073.47 | 19 | |
| Fact Verification | FEVER | EM61.1 | 18 | |
| Fact Verification | FEVER | F1 Score53.9 | 18 | |
| Fact Extraction and Verification | FEVER (test) | Label Accuracy (LA)75.96 | 18 | |
| Explanation Evaluation | FEVER (test) | Sufficiency9.72 | 16 | |
| Fact Verification | FEVER-Symmetric | Precision88 | 16 | |
| Knowledge Poisoning Attack | FEVER k=10 (test) | Attack Success Rate (ASR)73 | 15 | |
| Fact Verification (Adversarial Claim Rewriting) | FEVER | ASR2.63 | 15 | |
| Sentence-Level Confidence Prediction | FEVER | AUROC0.737 | 15 | |
| Document Reranking | FEVER | NDCG@581.556 | 14 | |
| Fact-checking | FEVER | F1 Macro94.3 | 14 |