Share your thoughts, 1 month free Claude Pro on usSee more

Binary Fact-checking on REVEAL

93.7Macro-F1

GPT-5

Updated 5mo ago

Evaluation Results

Method	Links
GPT-5 2026.01		93.7
GPT-4.1 2026.01		93.2
o3 2026.01		92.2
AlignScore-large 2026.01		92.2
DeepSeek-V3.2-NoThink 2026.01		91
MiniCheck 2026.01		91
FactCG 2026.01		90
InFi-Checker-Qwen 2026.01		90
Claude-3.7-Sonnet 2026.01		88
InFi-Checker-Llama 2026.01		87.7
ClearCheck (COT) 2026.01		87
GPT-4o 2026.01		86.9
Qwen3-8B 2026.01		83.2
Llama-3.1-8B-Instruct 2026.01		78.2