CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications
About
Background: Clinical named entity recognition tools commonly map free text to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs). For many downstream tasks, however, the clinically meaningful unit is not a single CUI but a concept set comprising related synonyms, subtypes, and associated concepts. Constructing these sets is labour-intensive, inconsistently performed, and poorly supported by existing tools. Methods We present CUICurate, a graph-based retrieval-augmented generation (GraphRAG) framework for automated UMLS concept set curation. A UMLS knowledge graph (KG) was constructed and embedded for semantic retrieval. Candidate CUIs were retrieved using graph-based expansion and then filtered and classified using large language models (GPT-5 and Qwen3-32B). The framework was evaluated on five lexically heterogeneous clinical concepts against a manually curated concept sets and gold-standard concept sets. Results CUICurate produced substantially larger and more complete concept sets than the manual benchmarks. A single retrieval configuration across concepts achieved high recall of definitive concepts with manageable candidate sets. GPT-5 outperformed manual curation for all concepts and retained at least 95% of definitive gold-standard CUIs, while Qwen3-32B achieved comparable but slightly lower performance. Many missed concepts were not observed in 10,000 MIMIC-III notes. CUICurate infrastructure and end-to-end processing was inexpensive and stable across runs. Conclusions CUICurate offers a scalable, reproducible and cost-efficient approach for generating clinician-reviewable UMLS concept sets tailored to clinical natural language processing and phenotyping applications.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Graph Retrieval | UMLS manual concept sets M | Recall98 | 5 | |
| Clinical Concept Classification | UMLS Five Target Concepts Definitive class (test) | Macro Recall71 | 3 | |
| LLM Filtering | Manually adjudicated gold-standard CUIs Chronic Heart Failure v1 (test) | CUIs Count137 | 3 | |
| LLM Filtering | Manually adjudicated gold-standard CUIs Fluid Overload v1 (test) | CUIs Count77 | 3 | |
| LLM Filtering | Manually adjudicated gold-standard CUIs Ischaemic Stroke v1 (test) | Total CUIs277 | 3 | |
| LLM Filtering | Manually adjudicated gold-standard CUIs LV Systolic Dysfunction v1 (test) | CUI Count90 | 3 | |
| LLM Filtering | Manually adjudicated gold-standard CUIs Poor Mobility v1 (test) | CUIs Count205 | 3 | |
| Clinical Concept Classification | UMLS Five Target Concepts All Classes (test) | Macro Recall0.72 | 3 | |
| Graph Retrieval | Chronic heart failure concept set (val) | Manual CUI Count98 | 1 | |
| Graph Retrieval | Fluid overload concept set (Manual val) | CUIs (Manual Count)30 | 1 |