Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Fact-Checking on BIONLI 300
Loading...
68.7
Balanced Accuracy
Atomic+Search
61.056
63.0405
65.025
67.0095
Apr 13, 2026
Balanced Accuracy
Recall
F1 Score
Updated 5d ago
Evaluation Results
Method
Method
Links
Balanced Accuracy
Recall
F1 Score
Atomic+Search
search=enabled
2026.04
68.7
65.4
66.7
GPT-5 Mini + Search
search=enabled
2026.04
66.9
62.7
65.8
RARR
2026.04
66.4
62.3
65.3
GPT-o1
2026.04
65.8
60.2
64.9
GPT-5 Mini
2026.04
62.9
58.7
61.8
Qwen 32B MAD
2026.04
62.5
59.1
61.3
MiniCheck
2026.04
61.35
59.8
60.7
Feedback
Search any
task
Search any
task