Human or LLM as Standardized Patients? A Comparative Study for Medical Education
About
Standardized patients (SPs) are indispensable for clinical skills training but remain expensive and difficult to scale. Although large language model (LLM)-based virtual standardized patients (VSPs) have been proposed as an alternative, their behavior remains unstable and lacks rigorous comparison with human standardized patients. We propose EasyMED, a multi-agent VSP framework that separates case-grounded information disclosure from response generation to support stable, inquiry-conditioned patient behavior. We also introduce SPBench, a human-grounded benchmark with eight expert-defined criteria for interaction-level evaluation. Experiments show that EasyMED more closely matches human SP behavior than existing VSPs, particularly in case consistency and controlled disclosure. A four-week controlled study further demonstrates learning outcomes comparable to human SP training, with stronger early gains for novice learners and improved flexibility, psychological safety, and cost efficiency.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Virtual Standardized Patient Simulation | SPBench | QC Score97.17 | 9 | |
| Clinical Skill Acquisition | OSCE Pre (test) | Mean Score70.56 | 2 | |
| Clinical Skill Acquisition | OSCE Mid-test | Mean Score86.07 | 2 | |
| Clinical Skill Acquisition | OSCE Post-test | Mean Score87.44 | 2 | |
| Clinical Skill Acquisition | OSCE Phase 1 (Weeks 1-2) | Mean Score Gain15.51 | 2 | |
| Clinical Skill Acquisition | OSCE Weeks 3-4 Phase 2 | Mean Score Gain3.19 | 2 | |
| Clinical Skill Acquisition | OSCE Total Gain | Mean Score Gain16.89 | 2 |