Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Biomedical Question Answering on Four biomedical QA datasets macro-averaged (test)
Loading...
85.3
Faithfulness
Med42-Llama3-8B
59.092
65.896
72.7
79.504
Jan 10, 2026
Faithfulness
Hallucination Rate
Safety Error Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Faithfulness
Hallucination Rate
Safety Error Rate
Med42-Llama3-8B
Framework=MedRAGChecker
2026.01
85.3
6.3
6.8
Med-Qwen2-7B
Framework=MedRAGChecker
2026.01
81.4
8
7.7
Meditron3-8B
Framework=MedRAGChecker
2026.01
71.5
7.6
8.2
PMC-LLaMA-13B
Framework=MedRAGChecker
2026.01
60.1
10.7
11.3
Feedback
Search any
task
Search any
task