Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Binary Fact-checking on REVEAL
Loading...
93.7
Macro-F1
GPT-5
77.58
81.765
85.95
90.135
Jan 10, 2026
Macro-F1
Updated 4d ago
Evaluation Results
Method
Method
Links
Macro-F1
GPT-5
Model Category=The Sta...
2026.01
93.7
GPT-4.1
Model Category=The Sta...
2026.01
93.2
o3
Model Category=The Sta...
2026.01
92.2
AlignScore-large
Model Category=Special...
2026.01
92.2
DeepSeek-V3.2-NoThink
Model Category=The Sta...
2026.01
91
MiniCheck
Model Category=Special...
2026.01
91
FactCG
Model Category=Special...
2026.01
90
InFi-Checker-Qwen
Model Category=Special...
2026.01
90
Claude-3.7-Sonnet
Model Category=The Sta...
2026.01
88
InFi-Checker-Llama
Model Category=Special...
2026.01
87.7
ClearCheck (COT)
Model Category=Special...
2026.01
87
GPT-4o
Model Category=The Sta...
2026.01
86.9
Qwen3-8B
Model Category=The Ope...
2026.01
83.2
Llama-3.1-8B-Instruct
Model Category=The Ope...
2026.01
78.2
Feedback
Search any
task
Search any
task