Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Fact-Checking on FEVER (Balanced Accuracy)
Loading...
91.9
Balanced Accuracy
WKGFC
46.452
58.251
70.05
81.849
Feb 27, 2026
Balanced Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Balanced Accuracy
WKGFC
Method Category=Specia...
2026.02
91.9
FIRE
Method Category=Specia...
2026.02
90.6
HerO
Method Category=Specia...
2026.02
67.5
Qwen2.5 72B
Method Category=Small-...
2026.02
58.1
Llama3.3 70B
Method Category=Small-...
2026.02
57.4
Claude 3.5-Sonnet
Method Category=Large-...
2026.02
57.1
GPT-4o
Method Category=Large-...
2026.02
55.3
Gemini-2.5-flash
Method Category=Large-...
2026.02
54.8
DeepSeek-V3 67B
Method Category=Large-...
2026.02
53.5
GPT-4
Method Category=Large-...
2026.02
51.4
Qwen2.5 7B
Method Category=Small-...
2026.02
50.1
Llama3 8B
Method Category=Small-...
2026.02
48.2
Feedback
Search any
task
Search any
task