Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Med42-v2: A Suite of Clinical LLMs

About

Med42-v2 introduces a suite of clinical large language models (LLMs) designed to address the limitations of generic models in healthcare settings. These models are built on Llama3 architecture and fine-tuned using specialized clinical data. They underwent multi-stage preference alignment to effectively respond to natural prompts. While generic models are often preference-aligned to avoid answering clinical queries as a precaution, Med42-v2 is specifically trained to overcome this limitation, enabling its use in clinical settings. Med42-v2 models demonstrate superior performance compared to the original Llama3 models in both 8B and 70B parameter configurations and GPT-4 across various medical benchmarks. These LLMs are developed to understand clinical queries, perform reasoning tasks, and provide valuable assistance in clinical environments. The models are now publicly available at \href{https://huggingface.co/m42-health}{https://huggingface.co/m42-health}.

Cl\'ement Christophe, Praveen K Kanithi, Tathagata Raha, Shadab Khan, Marco AF Pimentel• 2024

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA
Accuracy62.28
346
Medical Question AnsweringMedQA
Accuracy59.78
153
Question AnsweringMedQA
Accuracy77.5
96
Medical Question AnsweringPubMedQA
Accuracy78.1
92
Question AnsweringMMLU
Accuracy60.5
46
Medical ReasoningHealthBench Hard
Accuracy17.21
41
Health-related dialogue and decision-makingHealthBench Main
Average Score26.04
22
Medical order extractionSIMORD (test)
Match Count65.2
22
Medical Data and Knowledge ProcessingEHRStruct eICU
D-U1 Accuracy20
20
Data-Driven Structured EHR Understanding and ReasoningSynthea
D-R2 Accuracy17
19
Showing 10 of 16 rows

Other info

Follow for update