Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making

About

Foundation models are becoming valuable tools in medicine. Yet despite their promise, the best way to leverage Large Language Models (LLMs) in complex medical tasks remains an open question. We introduce a novel multi-agent framework, named Medical Decision-making Agents (MDAgents) that helps address this gap by automatically assigning a collaboration structure to a team of LLMs. The assigned solo or group collaboration structure is tailored to the medical task at hand, emulating real-world medical decision-making processes adapted to tasks of varying complexities. We evaluate our framework and baseline methods using state-of-the-art LLMs across a suite of real-world medical knowledge and medical diagnosis benchmarks, including a comparison of LLMs' medical complexity classification against human physicians. MDAgents achieved the best performance in seven out of ten benchmarks on tasks requiring an understanding of medical knowledge and multi-modal reasoning, showing a significant improvement of up to 4.2% (p < 0.05) compared to previous methods' best performances. Ablation studies reveal that MDAgents effectively determines medical complexity to optimize for efficiency and accuracy across diverse medical tasks. Notably, the combination of moderator review and external medical knowledge in group collaboration resulted in an average accuracy improvement of 11.8%. Our code can be found at https://github.com/mitmedialab/MDAgents.

Yubin Kim, Chanwoo Park, Hyewon Jeong, Yik Siu Chan, Xuhai Xu, Daniel McDuff, Hyeonhoon Lee, Marzyeh Ghassemi, Cynthia Breazeal, Hae Won Park• 2024

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA
Accuracy37.3
346
Medical Question AnsweringMedQA
Accuracy87.3
153
Question AnsweringPubMedQA
Accuracy78.5
145
Medical Question AnsweringPubMedQA
Accuracy48.7
92
Medical Visual Question AnsweringPMC-VQA
Accuracy56.4
74
Medical Question AnsweringMedbullets
Accuracy79.6
65
final diagnosis predictionMedCaseReasoning
Accuracy60.7
56
final diagnosis predictionMIMIC, MedCaseReasoning, ER-Reason Average
Average Accuracy58.1
56
final diagnosis predictionMIMIC
Accuracy90.4
56
final diagnosis predictionER-Reason
Accuracy23.3
56
Showing 10 of 67 rows

Other info

Code

Follow for update