| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RealHitBench | DeepSeek-R1 | Exact Match70.91 | 94 | 3d ago | |
| PubHealth | KG-CRAFTL3.3 | Balanced Accuracy78.66 | 26 | 1mo ago | |
| COVID-Fact | OpenAI o1 | Balanced Acc75.9 | 22 | 1mo ago | |
| LIAR-RAW | KG-CRAFT | Precision77.38 | 20 | 1mo ago | |
| FEVEROUS (test) | Trification | Macro F174.72 | 20 | 1mo ago | |
| InFi-Check-FG 1.0 (test) | Llama-3.1-8B-Instruct | PredE18.82 | 18 | 1mo ago | |
| FeLMWk | PCC | F1 (True)0.79 | 16 | 1mo ago | |
| HOVER 4-hop (test) | Trification | Macro F166.23 | 16 | 1mo ago | |
| HOVER 3-hop (test) | Trification | Macro F166.42 | 16 | 1mo ago | |
| HOVER 2-hop (test) | Trification | Macro F175.13 | 16 | 1mo ago | |
| PolitiFact | FakeCheckRAG | Real F1 Score85 | 15 | 1mo ago | |
| Average across General and Medical Domains | Overall Average73.6 | 15 | 1mo ago | ||
| SCIFact | OpenAI o1 | Balanced Acc90.3 | 15 | 1mo ago | |
| ExpertQA | GraphCheck | Balanced Accuracy60.3 | 15 | 1mo ago | |
| SummEval | Balanced Accuracy77.3 | 15 | 1mo ago | ||
| AggreFact CNN | GraphEval | Balanced Acc69.5 | 15 | 1mo ago | |
| AggreFact Xsum | GPT-4o | Balanced Accuracy76.4 | 15 | 1mo ago | |
| Causal and Downstream Robustness Ablation Suite Averaged over 4 models | HETA | Fact EMΔ3.7 | 14 | 2d ago | |
| FEVEROUS | F1 Macro89.4 | 14 | 1mo ago | ||
| FEVER | F1 Macro94.3 | 14 | 1mo ago | ||
| DeepFact-Bench (test) | DeepFact-Eval | Accuracy87.2 | 13 | 1mo ago | |
| FEVER | WKGFC | Balanced Accuracy91.9 | 12 | 1mo ago | |
| FEVEROUS-S | RRC | Macro F172.55 | 12 | 1mo ago | |
| HOVER | FOLK | Macro F1 (2-hop)71.82 | 12 | 1mo ago | |
| LIAR | Accuracy79 | 12 | 1mo ago |