Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HoVer

Benchmarks

Task NameDataset NameSOTA ResultTrend
Claim VerificationHoVer (test)
Accuracy73.1
31
Fact-checkingHOVER 4-hop (test)
Macro F166.23
16
Fact-checkingHOVER 3-hop (test)
Macro F166.42
16
Fact-checkingHOVER 2-hop (test)
Macro F175.13
16
Multi-hop Faithfulness Hallucination DetectionHoVer Refined
Macro F182.9
14
Fact-checkingHOVER
Macro F1 (2-hop)71.82
12
Claim VerificationHOVER 4-hop
Accuracy73.62
12
Claim VerificationHOVER 3-hop
Accuracy75.16
12
Claim VerificationHOVER 2-hop
Accuracy76.69
12
Retrieval-based Question AnsweringHoVer 4-HOP
Recall@10071.5
8
Multi-hop verificationHoVer
LLM Throughput (token/s)1,255
8
Agentic Workflow Performance (Iterative Refinement Loops)HoVer + LangChain
Latency (s)76.15
8
Fact VerificationHOVER (test)
AUROC56.6
8
Fact Extraction and Claim VerificationHoVer (test)
Recall63.2
7
Multi-hop Fact VerificationHoVer 4-Hop
Macro-F163
7
Multi-hop Fact VerificationHoVer 3-Hop
Macro F158
7
Multi-hop Fact VerificationHoVer 2-Hop
Macro F171
7
RetrievalHoVer
Recall@50.768
7
Claim VerificationHoVer
Accuracy71
6
Multi-hop Fact VerificationHoVer
Correctness66
5
Prompt Token EfficiencyHoVer
Max System Prompt Tokens5,252
4
Multi-hop Claim VerificationHoVer (test)
Accuracy (Test)79.4
4
Prompt OptimizationHover
Score52.33
4
Generative EvolutionHoVer (val)
Score (%)42
4
Multi-hop fact verificationHoVer few-shot
Recall56
4
Showing 25 of 27 rows