MedSAM3: Delving into Segment Anything with Medical Concepts
About
Medical image segmentation is fundamental for biomedical discovery. Existing methods lack generalizability and demand extensive, time-consuming manual annotation for new clinical application. Here, we propose MedSAM-3, a text promptable medical segmentation model for medical image and video segmentation. By fine-tuning the Segment Anything Model (SAM) 3 architecture on medical images paired with semantic conceptual labels, our MedSAM-3 enables medical Promptable Concept Segmentation (PCS), allowing precise targeting of anatomical structures via open-vocabulary text descriptions rather than solely geometric prompts. We further introduce the MedSAM-3 Agent, a framework that integrates Multimodal Large Language Models (MLLMs) to perform complex reasoning and iterative refinement in an agent-in-the-loop workflow. Comprehensive experiments across diverse medical imaging modalities, including X-ray, MRI, Ultrasound, CT, and video, demonstrate that our approach significantly outperforms existing specialist and foundation models. We will release our code and model at https://github.com/Joey-S-Liu/MedSAM3.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Medical Image Segmentation | PICAI | Dice0.376 | 19 | |
| Lesion Grounding and Segmentation | AbdomenAtlas 3.0 | Dice Score (K)34.5 | 17 | |
| Tumor Segmentation | LiTS | Dice0.703 | 17 | |
| Medical Image Segmentation | PROMIS | Dice Coefficient0.323 | 16 | |
| Medical Image Segmentation | BrainMetShare | Dice Score16.62 | 12 | |
| Medical Image Segmentation | PENGWIN | Dice18.26 | 12 | |
| Video Object Segmentation | CAMUS | Dice Coefficient67.15 | 9 | |
| Video Object Segmentation | Breast Lesion | Dice Coefficient56.93 | 9 | |
| Video Object Segmentation | Placenta | Dice (D)20.69 | 9 | |
| Kidney Tumor Segmentation | KITS | Dice72.4 | 8 |