CODER: Knowledge infused cross-lingual medical term embedding for term normalization
About
This paper proposes CODER: contrastive learning on knowledge graphs for cross-lingual medical term representation. CODER is designed for medical term normalization by providing close vector representations for different terms that represent the same or similar medical concepts with cross-lingual support. We train CODER via contrastive learning on a medical knowledge graph (KG) named the Unified Medical Language System, where similarities are calculated utilizing both terms and relation triplets from KG. Training with relations injects medical knowledge into embeddings and aims to provide potentially better machine learning features. We evaluate CODER in zero-shot term normalization, semantic similarity, and relation classification benchmarks, which show that CODERoutperforms various state-of-the-art biomedical word embedding, concept embeddings, and contextual embeddings. Our codes and models are available at https://github.com/GanjinZero/CODER.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Feature Selection Alignment | Expert-labeled disease feature relevance dataset | T1D Alignment Score80.3 | 26 | |
| Concept Similarity Detection | Multi-institutional EHR dataset | AUC96.9 | 25 | |
| Relatedness Detection | Multi-institutional EHR dataset | AUC0.831 | 25 | |
| Clinical Similarity Detection | General Clinical Relation Pairs | AUC0.876 | 25 | |
| Feature selection evaluation | GPT-4 Feature Relevance Estimation Suite Silver Standard (test) | T1D Score64.4 | 25 | |
| Clinical Relatedness Detection | General Clinical Relation Pairs | AUC65.5 | 25 | |
| Cross-institutional code mapping | UPMC LAB-LOINC | Spearman's Rank Correlation0.554 | 24 | |
| Cross-institutional code mapping | UPMC PX-CCS | Spearman's Correlation0.418 | 24 | |
| Cross-institutional code mapping | BDX CCAM-CCS | Spearman Correlation0.54 | 24 | |
| Medical Code Mapping | VA local laboratory codes to LOINC/LP | Top-1 Accuracy55.6 | 21 |