Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-hop Fact Verification on HoVer 4-Hop
Loading...
63
Macro-F1
MERMAID
51.56
54.53
57.5
60.47
Jan 29, 2026
Macro-F1
Updated 4d ago
Evaluation Results
Method
Method
Links
Macro-F1
MERMAID
LLM=GPT-4o
2026.01
63
MERMAID
LLM=GPT-5 Mini
2026.01
62
FOLK
2026.01
60
MERMAID
LLM=OSS-120B
2026.01
60
MERMAID
LLM=Qwen-2.5-70B
2026.01
57
ProgramFC
2026.01
53
Self-Ask
2026.01
52
Feedback
Search any
task
Search any
task