| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Claim Verification | HoVer (test) | Accuracy73.1 | 31 | |
| Fact-checking | HOVER 4-hop (test) | Macro F166.23 | 16 | |
| Fact-checking | HOVER 3-hop (test) | Macro F166.42 | 16 | |
| Fact-checking | HOVER 2-hop (test) | Macro F175.13 | 16 | |
| Multi-hop Faithfulness Hallucination Detection | HoVer Refined | Macro F182.9 | 14 | |
| Fact-checking | HOVER | Macro F1 (2-hop)71.82 | 12 | |
| Claim Verification | HOVER 4-hop | Accuracy73.62 | 12 | |
| Claim Verification | HOVER 3-hop | Accuracy75.16 | 12 | |
| Claim Verification | HOVER 2-hop | Accuracy76.69 | 12 | |
| Retrieval-based Question Answering | HoVer 4-HOP | Recall@10071.5 | 8 | |
| Multi-hop verification | HoVer | LLM Throughput (token/s)1,255 | 8 | |
| Agentic Workflow Performance (Iterative Refinement Loops) | HoVer + LangChain | Latency (s)76.15 | 8 | |
| Fact Verification | HOVER (test) | AUROC56.6 | 8 | |
| Fact Extraction and Claim Verification | HoVer (test) | Recall63.2 | 7 | |
| Multi-hop Fact Verification | HoVer 4-Hop | Macro-F163 | 7 | |
| Multi-hop Fact Verification | HoVer 3-Hop | Macro F158 | 7 | |
| Multi-hop Fact Verification | HoVer 2-Hop | Macro F171 | 7 | |
| Retrieval | HoVer | Recall@50.768 | 7 | |
| Claim Verification | HoVer | Accuracy71 | 6 | |
| Multi-hop Fact Verification | HoVer | Correctness66 | 5 | |
| Prompt Token Efficiency | HoVer | Max System Prompt Tokens5,252 | 4 | |
| Multi-hop Claim Verification | HoVer (test) | Accuracy (Test)79.4 | 4 | |
| Prompt Optimization | Hover | Score52.33 | 4 | |
| Generative Evolution | HoVer (val) | Score (%)42 | 4 | |
| Multi-hop fact verification | HoVer few-shot | Recall56 | 4 |