BioMamba: Domain-Adaptive Biomedical Language Models
About
Background: Biomedical language models should improve performance on biomedical text while retaining general-domain language ability. For Mamba-based models, this trade-off has not been clearly studied across biomedical literature and clinical text. Methods: We developed BioMamba, a family of biomedical models obtained by continued pretraining of public Mamba2 checkpoints on PubMed, with small amounts of general-domain data from the Colossal Clean Crawled Corpus (C4) and Wikipedia included to help preserve general-domain language ability. We evaluated language modeling and three downstream tasks across multiple model scales: clinical note completion, discharge summary generation, and biomedical yes/no question answering. Results: BioMamba consistently improved PubMed modeling, improved Wikipedia modeling, and left C4 performance largely unchanged. After supervised fine-tuning, BioMamba transferred well to both biomedical literature and clinical text, yielding strong results on completion, summarization, and question answering. On MIMIC-IV, BioMamba+SFT consistently matched or exceeded SFT from the corresponding base checkpoints across note completion and discharge summary generation. The strongest model achieved a PubMed perplexity of 5.28 and accuracies of 90.24% and 73.00% on BioASQ and PubMedQA, respectively. Conclusion: Balanced domain-adaptive pretraining strategy strengthens Mamba language models for both biomedical literature and clinical text, while preserving general-domain language capabilities, establishing BioMamba as a practical foundation for biomedical NLP applications.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | C4 | Perplexity14.91 | 1071 | |
| Question Answering | PubMedQA (test) | -- | 128 | |
| Language Modeling | Pubmed | Perplexity6.52 | 38 | |
| Language Modeling | Wikipedia | Perplexity9.71 | 35 | |
| Discharge summary generation | MIMIC-IV (test) | ROUGE-110.11 | 21 | |
| Note completion | MIMIC-IV (test) | ROUGE-18.11 | 21 | |
| Biomedical Natural Language Processing | Biomedical NLP Benchmarks | F1 Score88 | 6 | |
| Question Answering | BioASQ | Accuracy90.24 | 5 | |
| Question Answering | PubMedQA | Accuracy73 | 5 |