Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

medical QA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-choice Medical Question AnsweringMedical QA Multi-choice
MMLU-Med Accuracy70.7
22
Medical Question AnsweringOut-of-domain medical QA History, Engineering, Law (test)
History Accuracy50.4
10
Medical Question AnsweringMedical QA
GPT-4 Score92.5
9
Medical Question AnsweringMedical QA offline evaluation
Honesty0.83
3
Showing 4 of 4 rows