PMC-LLaMA: Towards Building Open-source Language Models for Medicine

About

Recently, Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding. While demonstrating proficiency in everyday conversations and question-answering situations, these models frequently struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. In this paper, we describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA. Our contributions are threefold: (i) we systematically investigate the process of adapting a general-purpose foundation language model towards medical domain, this involves data-centric knowledge injection through the integration of 4.8M biomedical academic papers and 30K medical textbooks, as well as comprehensive fine-tuning for alignment with domain-specific instructions; (ii) we contribute a large-scale, comprehensive dataset for instruction tuning. This dataset encompasses medical question-answering (QA), rationale for reasoning, and conversational dialogues, comprising a total of 202M tokens; (iii) we conduct thorough ablation studies to demonstrate the effectiveness of each proposed component. While evaluating on various public medical question-answering benchmarks, our lightweight PMCLLaMA, which consists of only 13 billion parameters, exhibits superior performance, even surpassing ChatGPT. All models, codes, datasets can be found in https://github.com/chaoyi-wu/PMC-LLaMA.

Chaoyi Wu, Weixiong Lin, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie• 2023

Related benchmarks

Task	Dataset	Result
Medical Question Answering	MedMCQA	Accuracy26.6	521
Question Answering	PubMedQA (test)	Accuracy53.3	170
Question Answering	PubMedQA	Accuracy72.9	145
Medical Question Answering	MedMCQA (test)	Accuracy23.5	134
Question Answering	MedQA-USMLE (test)	Accuracy44.7	101
Question Answering	MedQA	Accuracy25.5	96
Question Answering	MedQA (test)	Accuracy27.6	67
Question Answering	PubMedQA PQA-L (test)	Accuracy73.4	45
PET/CT Report Impression Generation	PET F2I 41K	BLEU-40.1643	27
Multiple-choice Question Answering	MedQA 5 opts	Accuracy21.1	26

Showing 10 of 33 rows

Other info

Follow for update

@wizwand_team Discord