ClinTutor-R1: Advancing Scalable and Robust One-to-Many Alignment in Clinical Socratic Education
About
While Large Language Models (LLMs) have achieved remarkable success in dyadic (one-on-one) instruction, they face significant challenges in One-to-Many alignment, such as clinical ward rounds, where an instructor must simultaneously guide a diverse group of trainees. Current models often suffer from context dilution and goal misalignment, failing to balance individual scaffolding with collective learning progress. To address this, we introduce ClinEdu, a multi-agent pedagogical simulator that models the complexity of group dynamics. Leveraging this platform, we construct ClinTeach, a large-scale dataset of Socratic teaching dialogues, and propose ClinTutor-R1, the first vision-language agent explicitly architected to achieve one-to-many alignment in clinical education, employing an explicit internal thinking mechanism to model both individual belief states and group consensus. We validate our framework through a comprehensive protocol covering static benchmarks, in-situ interactive evaluation within ClinEdu, expert assessment, and a 200-participant real user study. Experimental results demonstrate that ClinTutor-R1 outperforms base models by over 20% and achieves parity with proprietary models, while exhibiting scalability in maintaining instructional quality across expanding student cohorts.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Medical Visual Question Answering | PMC-VQA | Accuracy56.3 | 103 | |
| Medical Visual Question Answering | MedXpertQA | Accuracy25.1 | 44 | |
| Medical Question Answering | MedXpertQA (test) | ETS Score8.33 | 23 | |
| Medical Question Answering | MVME (test) | ETS8.41 | 23 | |
| Medical Visual Question Answering | MMMU | Accuracy58.82 | 19 |