Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning

About

The integration of Large Language Models (LLMs) into healthcare is constrained by knowledge limitations, hallucinations, and a disconnect from Evidence-Based Medicine (EBM). While Retrieval-Augmented Generation (RAG) offers a solution, current systems often rely on static workflows that miss the iterative, hypothetico-deductive reasoning of clinicians. To address this, we introduce Deep-DxSearch, an agentic RAG system trained end-to-end via reinforcement learning (RL) for traceable diagnostic reasoning. Deep-DxSearch acts as an active investigator, treating the LLM as an agent within an environment of 16,000+ guideline-derived disease profiles, 150,000+ patient records for case-based reasoning, and over 27 million biomedical documents. Using soft verifiable rewards that co-optimize retrieval and reasoning, the model learns to formulate queries, evaluate evidence, and refine searches to close diagnostic gaps. Experiments show our end-to-end RL framework consistently outperforms prompt-engineering and training-free RAG methods. On in-distribution (ID) and out-of-distribution (OOD) benchmarks for common and rare diseases, Deep-DxSearch surpasses strong baselines-including GPT-4o, DeepSeek-R1, and medical-specific frameworks-achieving an average accuracy gain of 22.7% over the second-best model. In validation with 150 real-world cases, Deep-DxSearch boosts physicians' average diagnostic accuracy from 45.6% to 69.1%. These results indicate that evolving agentic systems to leverage statistical regularities in large-scale healthcare data is key for trustworthy diagnostic assistants. All data, code, and checkpoints are available at https://qiaoyu-zheng.github.io/Deep-DxSearch.

Qiaoyu Zheng, Yuze Sun, Chaoyi Wu, Weike Zhao, Pengcheng Qiu, Yongguo Yu, Kun Sun, Jian Zhang, Yanfeng Wang, Ya Zhang, Weidi Xie• 2025

Related benchmarks

TaskDatasetResultRank
Rare Disease DiagnosisRareBench MME
R@142.5
21
Rare Disease DiagnosisDDD
Recall@139.42
21
Rare Disease DiagnosisMyGene
R@130.14
21
Rare Disease DiagnosisRareBench HMS
Recall@136.36
21
Rare Disease DiagnosisRareBench LIRICAL
R@129.46
21
Rare Disease DiagnosisMIMIC-IV Rare
R@112.75
21
Rare Disease DiagnosisRareBench RAMEDIS
Recall@128.57
21
Showing 7 of 7 rows

Other info

Follow for update