Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Medicine

Benchmarks

Task NameDataset NameSOTA ResultTrend
Medical ReasoningMedicine MedQA M-Med
MedQA Score75.2
40
Medical ReasoningMedicine MedBullets, MedXQA
Accuracy (MedBullets)84.87
18
Multi-hop Question AnsweringMedicine
F1 Score52.94
14
Medical Question AnsweringMedicine
Accuracy48
13
Knowledge Depth EvaluationMedicine
EVD Score7.55
5
Showing 5 of 5 rows