| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TabFact | Human | Accuracy92.1 | 83 | 1mo ago | |
| FEVER | COT-SC + VE | Accuracy53.9 | 67 | 1mo ago | |
| FEVER (dev) | ROBERTaBase Flexible (IE1) + MSPP (REk) | Label Accuracy82.1 | 57 | 1mo ago | |
| FEVER (val) | augmented surrogate | True Deferral-Advice Loss0.555 | 48 | 1mo ago | |
| FEVER (test) | ProoFVer-SB | LA Score79.47 | 32 | 1mo ago | |
| RAWFC | KG-CRAFT | Precision81.63 | 30 | 1mo ago | |
| MINE | Hyper-KGGen+ | Accuracy84.73 | 28 | 1mo ago | |
| LIAR | ERM | F1 Score68.6 | 24 | 10d ago | |
| FEVER 1.0 (dev) | ProoFVer | Label Accuracy89.07 | 23 | 1mo ago | |
| InfoTabs (held-out) | GPT-4o | Accuracy79.5 | 21 | 22d ago | |
| TabFact (held-in) | Gemini-2.5-Pro | Accuracy85.02 | 21 | 22d ago | |
| FEVER | SAVER | EM61.1 | 18 | 8d ago | |
| FEVER | FLARE | F1 Score53.9 | 18 | 22d ago | |
| Creak WikiData (test) | KG-Reasoner with Qwen-2.5-7B | Hits@197.45 | 17 | 3d ago | |
| FactKG | SeleCom | Accuracy67.44 | 17 | 1mo ago | |
| FEVER-Symmetric | RoBERTa-large | Precision88 | 16 | 1mo ago | |
| FACT | Accuracy99.44 | 15 | 1mo ago | ||
| FEVER 1.0 (test) | KGAT | Label Accuracy74.07 | 14 | 1mo ago | |
| VitaminC | CPO | Accuracy (%)54 | 12 | 1mo ago | |
| FEVER-S | CPO | Accuracy54 | 12 | 1mo ago | |
| FEVER | ToT | Accuracy61.4 | 12 | 1mo ago | |
| FEVER | CDKC | Accuracy73.73 | 11 | 1mo ago | |
| InfoTabs (test) | MiniCPM-V-2.6 8B | Accuracy75.74 | 11 | 1mo ago | |
| InfoTabS | Accuracy77.6 | 10 | 1mo ago | ||
| FEVER (test) | LPF-SPN | Accuracy99.7 | 10 | 1mo ago |