| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| PerplexityAI (test) | DYDECOMP | Verification Confidence82.3 | 52 | 4d ago | |
| ChartCheck | MEVER | Macro F10.643 | 38 | 4d ago | |
| AIChartClaim | MEVER | Macro F171.6 | 38 | 4d ago | |
| MR2 | MEVER | Macro F177.7 | 32 | 4d ago | |
| Mocheg | MEVER | Macro F149.7 | 32 | 4d ago | |
| FactKG (test) | SimGRAG | Average Accuracy86.8 | 20 | 4d ago | |
| DIALFACT (val) | Aug-WoW | Accuracy70.4 | 18 | 4d ago | |
| DIALFACT (test) | Aug-WoW | Accuracy69.2 | 18 | 4d ago | |
| SCIFACT | InfoRE + CoT | Accuracy94.32 | 12 | 4d ago | |
| FEVEROUS | InfoRE + CoT | Accuracy0.9567 | 12 | 4d ago | |
| HOVER 4-hop | InfoRE + CoT | Accuracy73.62 | 12 | 4d ago | |
| HOVER 3-hop | InfoRE + CoT | Accuracy75.16 | 12 | 4d ago | |
| HOVER 2-hop | InfoRE + CoT | Accuracy76.69 | 12 | 4d ago | |
| LIAR (test) | HiSS | Precision46.8 | 12 | 4d ago | |
| HoVer (test) | TOME-2 | Accuracy73.1 | 12 | 4d ago | |
| ChatGPT (test) | DYDECOMP | Verification Confidence82.4 | 11 | 4d ago | |
| FEVER (test) | RAG | Accuracy72.5 | 10 | 4d ago | |
| FM2 (dev) | TOME-2 | Accuracy68.4 | 8 | 4d ago | |
| HoVer | ADOPT-Joint | Accuracy71 | 6 | 4d ago | |
| PolitiFact | CICD | Micro F157.2 | 6 | 4d ago | |
| Snopes | CICD | Micro F10.846 | 6 | 4d ago | |
| CheckThat! S (100+/100+) 2021 re-annotated (dev) | Agreement57.9 | 5 | 4d ago | ||
| AVeriTeC (dev) | Althea | Supported F168 | 4 | 4d ago | |
| FEVEROUS (test) | ClaimPKG | Accuracy83.8 | 4 | 4d ago | |
| HoVer (dev) | TOME-2 | Accuracy74.1 | 4 | 4d ago |