BioMamba: Domain-Adaptive Biomedical Language Models

About

Background: Biomedical language models should improve performance on biomedical text while retaining general-domain language ability. For Mamba-based models, this trade-off has not been clearly studied across biomedical literature and clinical text. Methods: We developed BioMamba, a family of biomedical models obtained by continued pretraining of public Mamba2 checkpoints on PubMed, with small amounts of general-domain data from the Colossal Clean Crawled Corpus (C4) and Wikipedia included to help preserve general-domain language ability. We evaluated language modeling and three downstream tasks across multiple model scales: clinical note completion, discharge summary generation, and biomedical yes/no question answering. Results: BioMamba consistently improved PubMed modeling, improved Wikipedia modeling, and left C4 performance largely unchanged. After supervised fine-tuning, BioMamba transferred well to both biomedical literature and clinical text, yielding strong results on completion, summarization, and question answering. On MIMIC-IV, BioMamba+SFT consistently matched or exceeded SFT from the corresponding base checkpoints across note completion and discharge summary generation. The strongest model achieved a PubMed perplexity of 5.28 and accuracies of 90.24% and 73.00% on BioASQ and PubMedQA, respectively. Conclusion: Balanced domain-adaptive pretraining strategy strengthens Mamba language models for both biomedical literature and clinical text, while preserving general-domain language capabilities, establishing BioMamba as a practical foundation for biomedical NLP applications.

Ling Yue, Mingzhi Zhu, Sixue Xing, Shaowu Pan, Vijil Chenthamarakshan, Yanbo Wang, Yunning Cao, Payel Das, Tianfan Fu• 2024

Related benchmarks

Task	Dataset	Result
Language Modeling	C4	Perplexity14.91	1688
Question Answering	PubMedQA (test)	--	170
Language Modeling	Pubmed	Perplexity6.52	59
Language Modeling	Wikipedia	Perplexity9.71	43
Discharge summary generation	MIMIC-IV (test)	ROUGE-110.11	21
Note completion	MIMIC-IV (test)	ROUGE-18.11	21
Biomedical Natural Language Processing	Biomedical NLP Benchmarks	F1 Score88	6
Question Answering	BioASQ	Accuracy90.24	5
Question Answering	PubMedQA	Accuracy73	5

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord