Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Claim Verification on SciFact
Loading...
40.5
Accuracy
mistral-large
39.46
39.73
40
40.27
Feb 15, 2026
Accuracy
F1 Score
AURC
R@0.8
R@0.9
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
F1 Score
AURC
R@0.8
R@0.9
mistral-large
Model Family=Mistral,...
2026.02
40.5
19.2
22.1
50.6
54.9
deepseek-chat
Model Family=DeepSeek
2026.02
40
19.1
22.4
50.6
56.1
llama-3.3-70b
Model Family=Llama 3,...
2026.02
40
19
22.5
50.6
55.6
flan-t5-large
Model Family=FLAN-T5,...
2026.02
39.5
18.9
22.8
49.6
55.1
gpt-4o-mini
Model Family=GPT-4o, S...
2026.02
39.5
18.9
22.3
50.6
56.1
gpt-5.2-chat
Model Family=GPT-5
2026.02
39.5
18.9
22.6
50.6
55.9
Feedback
Search any
task
Search any
task