| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| FAKEWIKI Indirect | SCORINGMODEL | Recall@1021.6 | 26 | 26d ago | |
| FAKEWIKI NoiseInjection | SCORINGMODEL | Recall@1063.5 | 26 | 26d ago | |
| FAKEWIKI RolePlay | SCORINGMODEL | Recall@1063.8 | 26 | 26d ago | |
| FAKEWIKI Obfuscate | SCORINGMODEL | Recall@1059.5 | 26 | 26d ago | |
| FAKEWIKI Clean | SCORINGMODEL | Recall@1078.1 | 26 | 26d ago | |
| DocRED (dev) | DREEAM (student) | F1 Score57.55 | 14 | 3mo ago | |
| Claim-evidence sets (test) | Qwen3-Reranker-8B | MAP54.19 | 13 | 3mo ago | |
| DocRED (test) | DREEAM (student) | F1 Score57.34 | 13 | 3mo ago | |
| FEVEROUS 2021 (test) | Recall@2087.02 | 10 | 6d ago | ||
| FEVER 2018 (test) | DACLR | Recall@2088.48 | 10 | 6d ago | |
| G-bench Medical | G-reasoner | Recall93.8 | 10 | 3mo ago | |
| G-bench Novel | G-reasoner | Recall87.7 | 10 | 3mo ago | |
| NQ | Qwen3-Emb-4B + Jina reranker | Recall@531.9 | 9 | 26d ago | |
| 2Wiki | INTRA | Recall@540.7 | 9 | 26d ago | |
| MR2 | MEVER | Recall@15.9 | 7 | 3mo ago | |
| Mocheg | MEVER | Recall@124.4 | 7 | 3mo ago | |
| ChartCheck | MEVER | Recall@156 | 7 | 3mo ago | |
| AIChartClaim | MEVER | Recall@165.7 | 7 | 3mo ago | |
| MR2 | MEVER | Precision@137.6 | 7 | 3mo ago | |
| Mocheg | MEVER | Prec@153.1 | 7 | 3mo ago | |
| MR2 | MEVER | MAP19.5 | 7 | 3mo ago | |
| Mocheg | MEVER | MAP41.6 | 7 | 3mo ago | |
| ChartCheck | MEVER | MAP63.6 | 7 | 3mo ago | |
| AIChartClaim | MEVER | MAP71.4 | 7 | 3mo ago | |
| FRAMES | HybridDeepSearcher | Evidence Coverage Rate55.8 | 6 | 3mo ago |