Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BioMamba: Domain-Adaptive Biomedical Language Models

About

Background: Biomedical language models should improve performance on biomedical text while retaining general-domain language ability. For Mamba-based models, this trade-off has not been clearly studied across biomedical literature and clinical text. Methods: We developed BioMamba, a family of biomedical models obtained by continued pretraining of public Mamba2 checkpoints on PubMed, with small amounts of general-domain data from the Colossal Clean Crawled Corpus (C4) and Wikipedia included to help preserve general-domain language ability. We evaluated language modeling and three downstream tasks across multiple model scales: clinical note completion, discharge summary generation, and biomedical yes/no question answering. Results: BioMamba consistently improved PubMed modeling, improved Wikipedia modeling, and left C4 performance largely unchanged. After supervised fine-tuning, BioMamba transferred well to both biomedical literature and clinical text, yielding strong results on completion, summarization, and question answering. On MIMIC-IV, BioMamba+SFT consistently matched or exceeded SFT from the corresponding base checkpoints across note completion and discharge summary generation. The strongest model achieved a PubMed perplexity of 5.28 and accuracies of 90.24% and 73.00% on BioASQ and PubMedQA, respectively. Conclusion: Balanced domain-adaptive pretraining strategy strengthens Mamba language models for both biomedical literature and clinical text, while preserving general-domain language capabilities, establishing BioMamba as a practical foundation for biomedical NLP applications.

Ling Yue, Mingzhi Zhu, Sixue Xing, Shaowu Pan, Vijil Chenthamarakshan, Yanbo Wang, Yunning Cao, Payel Das, Tianfan Fu• 2024

Related benchmarks

TaskDatasetResultRank
Language ModelingC4
Perplexity14.91
1071
Question AnsweringPubMedQA (test)--
128
Language ModelingPubmed
Perplexity6.52
38
Language ModelingWikipedia
Perplexity9.71
35
Discharge summary generationMIMIC-IV (test)
ROUGE-110.11
21
Note completionMIMIC-IV (test)
ROUGE-18.11
21
Biomedical Natural Language ProcessingBiomedical NLP Benchmarks
F1 Score88
6
Question AnsweringBioASQ
Accuracy90.24
5
Question AnsweringPubMedQA
Accuracy73
5
Showing 9 of 9 rows

Other info

Follow for update