Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Fact-Checking on PubMedFact1k 3-way
Loading...
73.7
Macro F1
Atomic+Search
61.324
64.537
67.75
70.963
Apr 13, 2026
Macro F1
Updated 4d ago
Evaluation Results
Method
Method
Links
Macro F1
Atomic+Search
search=enabled
2026.04
73.7
GPT-5 Mini + Search
search=enabled
2026.04
72.5
RARR
2026.04
72.3
GPT-o1
2026.04
71.2
GPT-5 Mini
2026.04
68.5
Qwen 32B MAD
2026.04
61.8
Feedback
Search any
task
Search any
task