Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking

About

Cross-lingual biomedical entity linking (BEL) maps mentions in any language to unique identifiers in a biomedical knowledge base (KB), supporting clinical and biomedical NLP applications. However, expert-annotated training data for BEL are costly, especially for low-resource languages. Moreover, many cross-lingual BEL systems rely on SapBERT-based retrievers trained on predominantly English aliases in the KB, leading to poor generalization to unseen non-English mentions and limited context-aware disambiguation. We propose BioELX, a two-stage cross-lingual BEL framework that requires no task-specific annotated training corpora. In Stage~1, we enrich SapBERT training with Wikidata-derived multilingual aliases and use the resulting retriever to improve cross-lingual candidate retrieval. In Stage~2, we perform context-aware disambiguation with a pre-trained LLM ranker that jointly considers the mention context and candidate, eliminating the need for supervised training. Experiments on five benchmarks (XL-BEL, EMEA, Patent, WikiMed-DE, and MedMentions) show that BioELX achieves new state-of-the-art performance. It improves average Recall@1 on XL-BEL by +19.2, with especially large gains for low-resource languages, e.g., +21.6 on Turkish, +22.1 on Korean, +30.8 on Thai, and delivers consistent improvements on EMEA (+6.2), Patent (+5.4), and WikiMed-DE (+12.8). Code and resources will be released upon publication.

Yi Wang, Corina Dima, Liangyu Zhong, Steffen Staab• 2026

Related benchmarks

TaskDatasetResultRank
Biomedical Entity LinkingEMEA
Score (ES)67.1
10
Biomedical Entity LinkingPatent
FR Score75.2
10
Cross-lingual Biomedical Entity LinkingXL-BEL
EN Score91
10
Biomedical Entity LinkingWikiMed DE
DE Score67.4
9
Biomedical Entity LinkingMedMentions EN (test)
Recall@160.8
8
Showing 5 of 5 rows

Other info

Follow for update