Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-hop Faithfulness Hallucination Detection on HoVer Refined
Loading...
82.9
Macro F1
FaithLens
72.708
75.354
78
80.646
Dec 23, 2025
Macro F1
Updated 4d ago
Evaluation Results
Method
Method
Links
Macro F1
FaithLens
backbone=Llama-3.1-8B-...
2025.12
82.9
GPT-4.1
2025.12
82.6
Llama-3.1-405B-Inst
2025.12
81.6
o3
2025.12
81.1
ClearCheck
backbone=Llama-3.1-8B-...
2025.12
80.3
Claude-3.7-Sonnet
2025.12
80.2
DeepSeek-V3.2
thinking=true
2025.12
80
o1
2025.12
79.9
o3-mini
2025.12
78.5
DeepSeek-V3.2
thinking=false
2025.12
76.7
MiniCheck
parameters=7B
2025.12
74.9
GPT-4o
2025.12
73.6
AlignScore
parameters=355M
2025.12
73.3
FactCG
parameters=435M
2025.12
73.1
Feedback
Search any
task
Search any
task