Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence

About

Ultrasound imaging has become the preferred imaging modality for early cancer screening due to its advantages of non-ionizing radiation, low cost, and real-time imaging capabilities. However, conventional ultrasound diagnosis heavily relies on physician expertise, presenting challenges of high subjectivity and low diagnostic efficiency. Vision-language models (VLMs) offer promising solutions for this issue, but existing general-purpose models demonstrate limited knowledge in ultrasound medical tasks, with poor generalization in multi-organ lesion recognition and low efficiency across multi-task diagnostics. To address these limitations, we propose EchoVLM, a vision-language model specifically designed for ultrasound medical imaging. The model employs a Mixture of Experts (MoE) architecture trained on data spanning seven anatomical regions. This design enables the model to perform multiple tasks, including ultrasound report generation, diagnosis and visual question-answering (VQA). The experimental results demonstrated that EchoVLM achieved significant improvements of 10.15 and 4.77 points in BLEU-1 scores and ROUGE-1 scores respectively compared to Qwen2-VL on the ultrasound report generation task. These findings suggest that EchoVLM has substantial potential to enhance diagnostic accuracy in ultrasound imaging, thereby providing a viable technical solution for future clinical applications. Source code and model weights are available at https://github.com/Asunatan/EchoVLM.

Chaoyin She, Ruifang Lu, Lida Chen, Wei Wang, Qinghua Huang• 2025

Related benchmarks

TaskDatasetResultRank
Medical Report GenerationUltrasound Breast
BLEU-171.36
24
Medical Report GenerationUltrasound Gynecology
BLEU-152.52
24
Medical Report GenerationUltrasound Kidney
BLEU-177.56
24
Medical Report GenerationUltrasound Average
BLEU-153.87
24
Entity recognitionPublic Liver Ultrasound Datasets OOD (test)
Hamming Accuracy91.06
12
Entity recognitionPublic Thyroid Ultrasound Datasets OOD (test)
Hamming Accuracy62.4
12
Medical Report GenerationUltrasound Liver
BLEU-158.01
12
Medical Report GenerationUltrasound Thyroid
BLEU-150.55
12
Medical Report GenerationPublic Ultrasound Breast OOD
BLEU-129.16
12
Medical Report GenerationPublic Ultrasound Liver OOD
BLEU-137.36
12
Showing 10 of 27 rows

Other info

Follow for update