Med42-v2: A Suite of Clinical LLMs

About

Med42-v2 introduces a suite of clinical large language models (LLMs) designed to address the limitations of generic models in healthcare settings. These models are built on Llama3 architecture and fine-tuned using specialized clinical data. They underwent multi-stage preference alignment to effectively respond to natural prompts. While generic models are often preference-aligned to avoid answering clinical queries as a precaution, Med42-v2 is specifically trained to overcome this limitation, enabling its use in clinical settings. Med42-v2 models demonstrate superior performance compared to the original Llama3 models in both 8B and 70B parameter configurations and GPT-4 across various medical benchmarks. These LLMs are developed to understand clinical queries, perform reasoning tasks, and provide valuable assistance in clinical environments. The models are now publicly available at \href{https://huggingface.co/m42-health}{https://huggingface.co/m42-health}.

Cl\'ement Christophe, Praveen K Kanithi, Tathagata Raha, Shadab Khan, Marco AF Pimentel• 2024

Related benchmarks

Task	Dataset	Result
Medical Question Answering	MedMCQA	Accuracy62.28	521
Medical Question Answering	MedQA	Accuracy59.78	153
Medical Question Answering	PubMedQA	Accuracy78.1	117
Question Answering	MedQA	Accuracy77.5	96
Question Answering	MMLU	Accuracy60.5	46
Medical Reasoning	HealthBench Hard	Accuracy17.21	41
Health-related dialogue and decision-making	HealthBench Main	Average Score26.04	24
Medical Decision Making	MedQA	Accuracy55.96	23
Medical Decision Making	MMLUPH (test)	Accuracy44.84	23
Medical order extraction	SIMORD (test)	Match Count65.2	22

Showing 10 of 23 rows

Other info

Follow for update

@wizwand_team Discord