Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?

About

Multi-agent large language model (LLM) systems have emerged as a promising approach for clinical diagnosis, leveraging collaboration among agents to refine medical reasoning. However, most existing frameworks rely on single-vendor teams (e.g., multiple agents from the same model family), which risk correlated failure modes that reinforce shared biases rather than correcting them. We investigate the impact of vendor diversity by comparing Single-LLM, Single-Vendor, and Mixed-Vendor Multi-Agent Conversation (MAC) frameworks. Using three doctor agents instantiated with o4-mini, Gemini-2.5-Pro, and Claude-4.5-Sonnet, we evaluate performance on RareBench and DiagnosisArena. Mixed-vendor configurations consistently outperform single-vendor counterparts, achieving state-of-the-art recall and accuracy. Overlap analysis reveals the underlying mechanism: mixed-vendor teams pool complementary inductive biases, surfacing correct diagnoses that individual models or homogeneous teams collectively miss. These results highlight vendor diversity as a key design principle for robust clinical diagnostic systems.

Grace Chang Yuan, Xiaoman Zhang, Sung Eun Kim, Pranav Rajpurkar• 2026

Related benchmarks

TaskDatasetResultRank
Clinical DiagnosisRareBench (combined)
Recall@139.31
7
Clinical DiagnosisRareBench HMS subset n=88
Recall@151.14
7
Clinical diagnosis retrievalRareBench MME n=40
R@140
7
DiagnosisDiagnosisArena
Top-1 Accuracy36.36
7
Clinical DiagnosisRareBench LIRICAL
Recall@135.69
7
Showing 5 of 5 rows

Other info

Follow for update