Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

medical QA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-choice Medical Question AnsweringMedical QA Multi-choice
MMLU-Med Accuracy70.7
22
Medical Question AnsweringOut-of-domain medical QA History, Engineering, Law (test)
History Accuracy50.4
10
Medical Question AnsweringMedical QA
GPT-4 Score92.5
9
Medical Question AnsweringMedical QA Primary n=319 (test)
UCCR21.94
8
Medical Question AnsweringMedical QA Primary Split n = 314 (dev)
UCCR20.7
8
Medical Question AnsweringMedical QA offline evaluation
Honesty0.83
3
Showing 6 of 6 rows