Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Harnessing Large Language Models for Biomedical Named Entity Recognition

About

Background and Objective: Biomedical Named Entity Recognition (BioNER) is a foundational task in medical informatics, crucial for downstream applications like drug discovery and clinical trial matching. However, adapting general-domain Large Language Models (LLMs) to this task is often hampered by their lack of domain-specific knowledge and the performance degradation caused by low-quality training data. To address these challenges, we introduce BioSelectTune, a highly efficient, data-centric framework for fine-tuning LLMs that prioritizes data quality over quantity. Methods and Results: BioSelectTune reformulates BioNER as a structured JSON generation task and leverages our novel Hybrid Superfiltering strategy, a weak-to-strong data curation method that uses a homologous weak model to distill a compact, high-impact training dataset. Conclusions: Through extensive experiments, we demonstrate that BioSelectTune achieves state-of-the-art (SOTA) performance across multiple BioNER benchmarks. Notably, our model, trained on only 50% of the curated positive data, not only surpasses the fully-trained baseline but also outperforms powerful domain-specialized models like BioMedBERT.

Jian Chen, Leilei Su, Cong Sun• 2025

Related benchmarks

TaskDatasetResultRank
Named Entity RecognitionBC5CDR--
59
Biomedical Named Entity RecognitionNCBI-disease
Strict Match F188.29
8
Biomedical Named Entity RecognitionBC5CDR-Disease
Strict F1-Score85.71
7
Biomedical Named Entity Recognitionbc2gm
Strict Match F181.44
7
Named Entity RecognitionNLM-Gene (unseen out-of-domain)
Precision85.79
3
Named Entity RecognitionNLM-Chem (unseen out-of-domain)
Precision85.24
3
Showing 6 of 6 rows

Other info

Follow for update