Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Binary Fact-checking on Claim Verify
Loading...
0.896
Macro F1
InFi-Checker-Qwen
0.6256
0.6958
0.766
0.8362
Jan 10, 2026
Macro F1
Updated 4d ago
Evaluation Results
Method
Method
Links
Macro F1
InFi-Checker-Qwen
Model Category=Special...
2026.01
0.896
GPT-5
Model Category=The Sta...
2026.01
0.877
MiniCheck
Model Category=Special...
2026.01
0.856
ClearCheck (COT)
Model Category=Special...
2026.01
0.854
Claude-3.7-Sonnet
Model Category=The Sta...
2026.01
0.837
o3
Model Category=The Sta...
2026.01
0.833
GPT-4.1
Model Category=The Sta...
2026.01
0.816
AlignScore-large
Model Category=Special...
2026.01
0.798
GPT-4o
Model Category=The Sta...
2026.01
0.783
FactCG
Model Category=Special...
2026.01
0.762
InFi-Checker-Llama
Model Category=Special...
2026.01
0.759
DeepSeek-V3.2-NoThink
Model Category=The Sta...
2026.01
0.754
Qwen3-8B
Model Category=The Ope...
2026.01
0.661
Llama-3.1-8B-Instruct
Model Category=The Ope...
2026.01
0.636
Feedback
Search any
task
Search any
task