Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UMind-VL: A Generalist Ultrasound Vision-Language Model for Unified Grounded Perception and Comprehensive Interpretation

About

Despite significant strides in medical foundation models, the ultrasound domain lacks a comprehensive solution capable of bridging low-level Ultrasound Grounded Perception (e.g., segmentation, localization) and high-level Ultrasound Comprehensive Interpretation (e.g., diagnosis, reasoning). To bridge this gap, we propose UMind-VL, a unified foundation model designed to synergize pixel-level structural understanding with complex clinical reasoning. We first introduce UMind-DS, a large-scale multimodal dataset comprising 1.2 million ultrasound image-text pairs across 16 anatomical regions, enriching standard data with pixel-level annotations and clinician-validated rationales. Architecturally, UMind-VL incorporates a lightweight Dynamic Convolutional Mask Decoder that generates masks via dynamic kernels conditioned on LLM outputs. This design, combined with task-specific tokens, unifies segmentation, detection, geometric measurement, and diagnosis tasks within a single framework. Extensive evaluations demonstrate that UMind-VL significantly outperforms existing generalist multimodal models and achieves performance on par with, or superior to, state-of-the-art specialist models across segmentation, detection, keypoint localization, and diagnostic reasoning benchmarks, while maintaining strong generalization ability. We demonstrate the capability of UMind-VL in Figure 1.

Dengbo Chen, Ziwei Zhao, Kexin Zhang, Shishuang Zhao, Junjie Hou, Yaqian Wang, Nianxi Liao, Anlan Sun, Fei Gao, Jia Ding, Yuhang Liu, Dong Wang• 2025

Related benchmarks

TaskDatasetResultRank
Medical Image Segmentationbreast ultrasound (test)
mIoU0.8122
19
DiagnosisBUS-CoT
Accuracy77.74
14
DiagnosisTN5K
Accuracy89.4
14
Medical lesion detectionBUS-CoT (test)
Precision94.28
14
Medical lesion detectionTN5k (test)
Precision90.92
14
DiagnosisBUS-BRA OOD
Accuracy84.96
11
Medical SegmentationGynecological Ultrasound (test)
mIoU0.7238
7
Medical SegmentationAbdominal Ultrasound (test)
mIoU60.07
7
Medical SegmentationMusculoskeletal Ultrasound (test)
mIoU81.24
7
Medical SegmentationThyroid Ultrasound (test)
mIoU0.735
7
Showing 10 of 17 rows

Other info

Follow for update