Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data

About

As large language models (LLMs) like OpenAI's GPT series continue to make strides, we witness the emergence of artificial intelligence applications in an ever-expanding range of fields. In medicine, these LLMs hold considerable promise for improving medical workflows, diagnostics, patient care, and education. Yet, there is an urgent need for open-source models that can be deployed on-premises to safeguard patient privacy. In our work, we present an innovative dataset consisting of over 160,000 entries, specifically crafted to fine-tune LLMs for effective medical applications. We investigate the impact of fine-tuning these datasets on publicly accessible pre-trained LLMs, and subsequently, we juxtapose the performance of pre-trained-only models against the fine-tuned models concerning the examinations that future medical doctors must pass to achieve certification.

Tianyu Han, Lisa C. Adams, Jens-Michalis Papaioannou, Paul Grundmann, Tom Oberhauser, Alexei Figueroa, Alexander L\"oser, Daniel Truhn, Keno K. Bressem• 2023

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA
Accuracy55.2
521
Question AnsweringPubMedQA (test)
Accuracy56
170
Question AnsweringPubMedQA
Accuracy73.6
145
Medical Question AnsweringMedMCQA (test)
Accuracy45.8
134
Question AnsweringMedQA
Accuracy55.2
96
Question AnsweringMedQA (test)
Accuracy55.2
67
Medical ReasoningMedQA
Accuracy38.81
47
Multilingual Medical ReasoningCUREMED-BENCH (test)
Consistency3.5
33
Multiple-choice Question AnsweringMedQA 5 opts
Accuracy33.7
26
Clinical ReasoningMIMIC-CDM-FI
Accuracy41
26
Showing 10 of 41 rows

Other info

Follow for update