Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Med42-v2: A Suite of Clinical LLMs

About

Med42-v2 introduces a suite of clinical large language models (LLMs) designed to address the limitations of generic models in healthcare settings. These models are built on Llama3 architecture and fine-tuned using specialized clinical data. They underwent multi-stage preference alignment to effectively respond to natural prompts. While generic models are often preference-aligned to avoid answering clinical queries as a precaution, Med42-v2 is specifically trained to overcome this limitation, enabling its use in clinical settings. Med42-v2 models demonstrate superior performance compared to the original Llama3 models in both 8B and 70B parameter configurations and GPT-4 across various medical benchmarks. These LLMs are developed to understand clinical queries, perform reasoning tasks, and provide valuable assistance in clinical environments. The models are now publicly available at \href{https://huggingface.co/m42-health}{https://huggingface.co/m42-health}.

Cl\'ement Christophe, Praveen K Kanithi, Tathagata Raha, Shadab Khan, Marco AF Pimentel• 2024

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA
Accuracy62.28
521
Medical Question AnsweringMedQA
Accuracy59.78
153
Medical Question AnsweringPubMedQA
Accuracy78.1
117
Question AnsweringMedQA
Accuracy77.5
96
Question AnsweringMMLU
Accuracy60.5
46
Medical ReasoningHealthBench Hard
Accuracy17.21
41
Health-related dialogue and decision-makingHealthBench Main
Average Score26.04
24
Medical Decision MakingMedQA
Accuracy55.96
23
Medical Decision MakingMMLUPH (test)
Accuracy44.84
23
Medical order extractionSIMORD (test)
Match Count65.2
22
Showing 10 of 23 rows

Other info

Follow for update