Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data

About

As large language models (LLMs) like OpenAI's GPT series continue to make strides, we witness the emergence of artificial intelligence applications in an ever-expanding range of fields. In medicine, these LLMs hold considerable promise for improving medical workflows, diagnostics, patient care, and education. Yet, there is an urgent need for open-source models that can be deployed on-premises to safeguard patient privacy. In our work, we present an innovative dataset consisting of over 160,000 entries, specifically crafted to fine-tune LLMs for effective medical applications. We investigate the impact of fine-tuning these datasets on publicly accessible pre-trained LLMs, and subsequently, we juxtapose the performance of pre-trained-only models against the fine-tuned models concerning the examinations that future medical doctors must pass to achieve certification.

Tianyu Han, Lisa C. Adams, Jens-Michalis Papaioannou, Paul Grundmann, Tom Oberhauser, Alexei Figueroa, Alexander L\"oser, Daniel Truhn, Keno K. Bressem• 2023

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA
Accuracy45.8
253
Question AnsweringPubMedQA
Accuracy73.6
145
Medical Question AnsweringMedMCQA (test)
Accuracy45.8
134
Question AnsweringPubMedQA (test)
Accuracy56
81
Question AnsweringMedQA
Accuracy55.2
70
Question AnsweringMedQA (test)
Accuracy55.2
61
Multilingual Medical ReasoningCUREMED-BENCH (test)
Consistency3.5
33
Multiple-choice Question AnsweringMedQA 5 opts
Accuracy33.7
26
Multiple-choice Question AnsweringMMLU Medical and Biological Sub-tasks
Clinical Knowledge Accuracy53.1
24
Truthfulness EvaluationTruthfulQA medical (test)
Health Score41.8
22
Showing 10 of 32 rows

Other info

Follow for update