Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Binary Fact-checking on MeetingBank
Loading...
87.6
Macro-F1
GPT-5
61.288
68.119
74.95
81.781
Jan 10, 2026
Macro-F1
Updated 4d ago
Evaluation Results
Method
Method
Links
Macro-F1
GPT-5
Model Category=The Sta...
2026.01
87.6
GPT-4.1
Model Category=The Sta...
2026.01
86.3
Claude-3.7-Sonnet
Model Category=The Sta...
2026.01
84
o3
Model Category=The Sta...
2026.01
83.8
DeepSeek-V3.2-NoThink
Model Category=The Sta...
2026.01
82.9
InFi-Checker-Qwen
Model Category=Special...
2026.01
78.5
MiniCheck
Model Category=Special...
2026.01
77.8
GPT-4o
Model Category=The Sta...
2026.01
76.9
AlignScore-large
Model Category=Special...
2026.01
76.5
ClearCheck (COT)
Model Category=Special...
2026.01
75.8
Qwen3-8B
Model Category=The Ope...
2026.01
74.2
FactCG
Model Category=Special...
2026.01
71.9
InFi-Checker-Llama
Model Category=Special...
2026.01
65.8
Llama-3.1-8B-Instruct
Model Category=The Ope...
2026.01
62.3
Feedback
Search any
task
Search any
task