Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Veracity Assessment on FacTool-QA
Loading...
92
True F1
MERMAID
61.84
69.67
77.5
85.33
Jan 29, 2026
True F1
False F1
Macro-F1
Updated 4d ago
Evaluation Results
Method
Method
Links
True F1
False F1
Macro-F1
MERMAID
LLM=GPT-4o
2026.01
92
68
80
MERMAID
LLM=GPT-5-mini
2026.01
90
66
78
FIRE
LLM=GPT-4o
2026.01
89
66
77
SAFE
LLM=GPT-4o
2026.01
88
63
76
MERMAID
LLM=OSS-120B
2026.01
88
56
72
MERMAID
LLM=Qwen-2.5-70B
2026.01
86
59
73
MERMAID
LLM=OSS-20B
2026.01
86
54
70
MERMAID
LLM=Qwen-2.5-7B
2026.01
85
54
70
FacTool
LLM=GPT-4o
2026.01
84
58
71
FactCheck-GPT
LLM=GPT-4o
2026.01
84
60
72
MERMAID
LLM=LLaMA-3.1-70B-Inst
2026.01
69
44
57
MERMAID
LLM=LLaMA-3.1-8B-Inst
2026.01
63
41
52
Feedback
Search any
task
Search any
task