Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RDMA: Cost Effective Agent-Driven Rare Disease Mining from Electronic Health Records

About

Rare diseases affect 1 in 10 Americans yet remain systematically underdocumented in clinical records. ICD-based systems cannot capture their breadth, over 50\% of Orphanet codes lack a direct ICD mapping and only 2.2\% of HPO codes have matching ICD codes, leaving patient populations invisible and delaying diagnosis. Mining unstructured clinical notes offers a direct path forward, but real notes are long, noisy, and abbreviation-dense, and limited annotations make fine-tuning infeasible, demanding approaches that generalize without task-specific training. We present Rare Disease Mining Agents (RDMA), an agentic framework equipping smaller quantized LLMs with tools for abbreviation resolution, implicit phenotype reasoning, and ontology grounding against Orphanet and HPO. RDMA substantially outperforms fine-tuned and RAG-based baselines across benchmarks with different data characteristics, without any task-specific training. A small quantized model achieves maximal performance, reducing inference costs by up to 10x and local hardware costs by up to 17x, enabling private deployment on standard hardware without cloud-based PHI exposure. RDMA's uncertainty-flagging mechanism further reduces expert annotation burden while preserving agreement quality, supporting scalable rare disease documentation in clinical practice. Available at https://github.com/jhnwu3/RDMA.

John Wu, Adam Cross, Jimeng Sun• 2025

Related benchmarks

TaskDatasetResultRank
Phenotype MiningBiolarkGSC+
F1 Score0.559
30
Phenotype ExtractionCSC (n=116 docs)
F1 Score65.7
21
Rare disease mention extractionMIMIC3-RD Entity n=117 docs (test)
Micro-F159.2
19
Rare disease mention extractionMIMIC3 RD Code n=79 docs (test)
Micro-F1 Score52.6
19
Rare disease mention extractionRareDis n=1,011 docs (test)
Micro-F184.5
19
Phenotype MiningCSC
Precision64.4
9
Rare Disease MiningMIMIC RD Entity 3
Precision51.6
7
Rare Disease MiningMIMIC RD Code 3
Precision46
7
Rare Disease MiningRareDis
Precision87
7
Showing 9 of 9 rows

Other info

Follow for update