Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PMC-LLaMA: Towards Building Open-source Language Models for Medicine

About

Recently, Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding. While demonstrating proficiency in everyday conversations and question-answering situations, these models frequently struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. In this paper, we describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA. Our contributions are threefold: (i) we systematically investigate the process of adapting a general-purpose foundation language model towards medical domain, this involves data-centric knowledge injection through the integration of 4.8M biomedical academic papers and 30K medical textbooks, as well as comprehensive fine-tuning for alignment with domain-specific instructions; (ii) we contribute a large-scale, comprehensive dataset for instruction tuning. This dataset encompasses medical question-answering (QA), rationale for reasoning, and conversational dialogues, comprising a total of 202M tokens; (iii) we conduct thorough ablation studies to demonstrate the effectiveness of each proposed component. While evaluating on various public medical question-answering benchmarks, our lightweight PMCLLaMA, which consists of only 13 billion parameters, exhibits superior performance, even surpassing ChatGPT. All models, codes, datasets can be found in https://github.com/chaoyi-wu/PMC-LLaMA.

Chaoyi Wu, Weixiong Lin, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie• 2023

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA
Accuracy26.6
253
Question AnsweringPubMedQA
Accuracy72.9
145
Medical Question AnsweringMedMCQA (test)
Accuracy23.5
134
Question AnsweringMedQA-USMLE (test)
Accuracy44.7
101
Question AnsweringPubMedQA (test)
Accuracy53.3
81
Question AnsweringMedQA
Accuracy25.5
70
Question AnsweringMedQA (test)
Accuracy27.6
61
Multiple-choice Question AnsweringMedQA 5 opts
Accuracy21.1
26
Question AnsweringPubMedQA PQA-L (test)
Accuracy73.4
25
Multiple-choice Question AnsweringMMLU Medical and Biological Sub-tasks
Clinical Knowledge Accuracy24.5
24
Showing 10 of 27 rows

Other info

Follow for update