Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Global Fact Consistency Verification on FEVER
Loading...
99.5
Precision
Direct (baseline)
96.276
97.113
97.95
98.787
Jan 20, 2026
Precision
Recall
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Precision
Recall
F1 Score
Direct (baseline)
Backbone=Claude 4
2026.01
99.5
83.3
89.1
Direct (baseline)
Backbone=Claude 3.7
2026.01
99.2
80.5
87.3
QXR
Backbone=GPT-OSS-120B
2026.01
99.2
98
98.5
Direct (baseline)
Backbone=DeepSeek-R1
2026.01
98.9
82.1
87.5
QXR
Backbone=DeepSeek-R1
2026.01
98.8
98
98.3
QXR
Backbone=Claude 3.7
2026.01
98.3
97.7
98
QXR
Backbone=Claude 4
2026.01
98.1
97.7
97.8
Direct (baseline)
Backbone=GPT-OSS-120B
2026.01
97.6
97.5
97.6
Direct (baseline)
Backbone=Mistral Large...
2026.01
97
78
84.8
QXR
Backbone=Mistral Large...
2026.01
96.4
99
97.6
Feedback
Search any
task
Search any
task