Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Fact-checking on HOVER 4-hop (test)
Loading...
66.23
Macro F1
Trification
47.8636
52.6318
57.4
62.1682
Nov 29, 2025
Macro F1
Updated 4d ago
Evaluation Results
Method
Method
Links
Macro F1
Trification
Model Category=Our, Me...
2025.11
66.23
PACAR
Model Category=III
2025.11
63.82
LoCaL
Model Category=IV
2025.11
61.59
Trification
Model Category=Our, Me...
2025.11
60.99
ProgramFC
Model Category=III, Nu...
2025.11
59.16
ChatGPT
Model Category=II
2025.11
58.73
ProgramFC
Model Category=III, Nu...
2025.11
57.74
Codex
Model Category=II
2025.11
57.27
SearChain
Model Category=IV
2025.11
56.54
DeBERTaV3-NLI
Model Category=I
2025.11
56
Flan-T5
Model Category=II
2025.11
55.42
Trification
Model Category=Our, Me...
2025.11
55.2
RoBERTa-NLI
Model Category=I
2025.11
52.4
MUTIVERS
Model Category=I
2025.11
51.86
LisT5
Model Category=I
2025.11
50.46
BERT-FC
Model Category=I
2025.11
48.57
Feedback
Search any
task
Search any
task