Med42-v2: A Suite of Clinical LLMs
About
Med42-v2 introduces a suite of clinical large language models (LLMs) designed to address the limitations of generic models in healthcare settings. These models are built on Llama3 architecture and fine-tuned using specialized clinical data. They underwent multi-stage preference alignment to effectively respond to natural prompts. While generic models are often preference-aligned to avoid answering clinical queries as a precaution, Med42-v2 is specifically trained to overcome this limitation, enabling its use in clinical settings. Med42-v2 models demonstrate superior performance compared to the original Llama3 models in both 8B and 70B parameter configurations and GPT-4 across various medical benchmarks. These LLMs are developed to understand clinical queries, perform reasoning tasks, and provide valuable assistance in clinical environments. The models are now publicly available at \href{https://huggingface.co/m42-health}{https://huggingface.co/m42-health}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Medical Question Answering | MedMCQA | Accuracy62.28 | 253 | |
| Medical Question Answering | MedQA | Accuracy59.78 | 109 | |
| Medical Question Answering | PubMedQA | Accuracy78.1 | 45 | |
| Medical order extraction | SIMORD (test) | Match Count65.2 | 22 | |
| Clinical Diagnostic Reasoning | Clinical Diagnostic Reasoning Benchmark 1.0 (test) | ICD Recall27.87 | 13 | |
| Biomedical Question Answering | Four biomedical QA datasets macro-averaged (test) | Faithfulness85.3 | 4 |