Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Clinical ModernBERT: An efficient and long context encoder for biomedical text

About

We introduce Clinical ModernBERT, a transformer based encoder pretrained on large scale biomedical literature, clinical notes, and medical ontologies, incorporating PubMed abstracts, MIMIC IV clinical data, and medical codes with their textual descriptions. Building on ModernBERT the current state of the art natural language text encoder featuring architectural upgrades such as rotary positional embeddings (RoPE), Flash Attention, and extended context length up to 8,192 tokens our model adapts these innovations specifically for biomedical and clinical domains. Clinical ModernBERT excels at producing semantically rich representations tailored for long context tasks. We validate this both by analyzing its pretrained weights and through empirical evaluation on a comprehensive suite of clinical NLP benchmarks.

Simon A. Lee, Anthony Wu, Jeffrey N. Chiang• 2025

Related benchmarks

TaskDatasetResultRank
Information RetrievalTREC-COVID
NDCG@1023.88
44
RetrievalSCIDOCS
nDCG@103.64
18
Named Entity Recognitionbsc-bio-distemist ES
F1 Score70.22
6
Named Entity Recognitioncantemist ES
F1 Score30.91
6
Named Entity Recognitionpharmaconer ES
F1 Score81.69
6
RetrievalAbSanitas ES
nDCG@1018.08
6
RetrievalR2Med EN
nDCG@105.91
6
RetrievalSciFact EN
nDCG@1020.34
6
Showing 8 of 8 rows

Other info

Follow for update