PMC-LLaMA: Towards Building Open-source Language Models for Medicine
About
Recently, Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding. While demonstrating proficiency in everyday conversations and question-answering situations, these models frequently struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. In this paper, we describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA. Our contributions are threefold: (i) we systematically investigate the process of adapting a general-purpose foundation language model towards medical domain, this involves data-centric knowledge injection through the integration of 4.8M biomedical academic papers and 30K medical textbooks, as well as comprehensive fine-tuning for alignment with domain-specific instructions; (ii) we contribute a large-scale, comprehensive dataset for instruction tuning. This dataset encompasses medical question-answering (QA), rationale for reasoning, and conversational dialogues, comprising a total of 202M tokens; (iii) we conduct thorough ablation studies to demonstrate the effectiveness of each proposed component. While evaluating on various public medical question-answering benchmarks, our lightweight PMCLLaMA, which consists of only 13 billion parameters, exhibits superior performance, even surpassing ChatGPT. All models, codes, datasets can be found in https://github.com/chaoyi-wu/PMC-LLaMA.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Medical Question Answering | MedMCQA | Accuracy26.6 | 253 | |
| Question Answering | PubMedQA | Accuracy72.9 | 145 | |
| Medical Question Answering | MedMCQA (test) | Accuracy23.5 | 134 | |
| Question Answering | MedQA-USMLE (test) | Accuracy44.7 | 101 | |
| Question Answering | PubMedQA (test) | Accuracy53.3 | 81 | |
| Question Answering | MedQA | Accuracy25.5 | 70 | |
| Question Answering | MedQA (test) | Accuracy27.6 | 61 | |
| Multiple-choice Question Answering | MedQA 5 opts | Accuracy21.1 | 26 | |
| Question Answering | PubMedQA PQA-L (test) | Accuracy73.4 | 25 | |
| Multiple-choice Question Answering | MMLU Medical and Biological Sub-tasks | Clinical Knowledge Accuracy24.5 | 24 |