| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| 9-dataset aggregate retrieval-free setting (test) | GPT-4.1 | ROC-AUC84 | 70 | 1mo ago | |
| PerplexityAI (test) | DYDECOMP | Verification Confidence82.3 | 52 | 1mo ago | |
| ChartCheck | MEVER | Macro F10.643 | 38 | 1mo ago | |
| AIChartClaim | MEVER | Macro F171.6 | 38 | 1mo ago | |
| MR2 | MEVER | Macro F177.7 | 32 | 1mo ago | |
| Mocheg | MEVER | Macro F149.7 | 32 | 1mo ago | |
| HoVer (test) | TOME-2 | Accuracy73.1 | 31 | 16d ago | |
| AVeriTeC Retrieved (I) (dev) | DebateCV | Accuracy73.6 | 28 | 12d ago | |
| AVeriTeC Retrieved (H) (dev) | DebateCV | Accuracy72.8 | 28 | 12d ago | |
| AVeriTeC Golden (dev) | DebateCV | Accuracy83.4 | 28 | 12d ago | |
| FactKG (test) | SimGRAG | Average Accuracy86.8 | 20 | 1mo ago | |
| DIALFACT (val) | Aug-WoW | Accuracy70.4 | 18 | 1mo ago | |
| DIALFACT (test) | Aug-WoW | Accuracy69.2 | 18 | 1mo ago | |
| AmbiguousSnopes | CO-FACTCHECKER | Precision39 | 14 | 2d ago | |
| ExClaim | CO-FACTCHECKER | Precision (P)34 | 14 | 2d ago | |
| SCIFACT | InfoRE + CoT | Accuracy94.32 | 12 | 1mo ago | |
| FEVEROUS | InfoRE + CoT | Accuracy0.9567 | 12 | 1mo ago | |
| HOVER 4-hop | InfoRE + CoT | Accuracy73.62 | 12 | 1mo ago | |
| HOVER 3-hop | InfoRE + CoT | Accuracy75.16 | 12 | 1mo ago | |
| HOVER 2-hop | InfoRE + CoT | Accuracy76.69 | 12 | 1mo ago | |
| LIAR (test) | HiSS | Precision46.8 | 12 | 1mo ago | |
| ChatGPT (test) | DYDECOMP | Verification Confidence82.4 | 11 | 1mo ago | |
| FEVER (test) | RAG | Accuracy72.5 | 10 | 1mo ago | |
| LLMAggreFact (test) | ThinknCheck | Binary Accuracy78.1 | 9 | 15d ago | |
| FM2 (dev) | TOME-2 | Accuracy68.4 | 8 | 1mo ago |