Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Adapting Foundation Vision-Language Models to Medical Diagnosis via Query-Driven Expert Bridging

About

Vision-language foundation models achieve promising performance in natural image classification, yet their direct application to medical imaging is limited by severe domain shifts, resolution mismatches, and the multi-label nature of clinical diagnosis. Training dedicated medical foundation models from scratch, however, is costly and data-intensive. Here, we propose MedBridge, a lightweight adaptation framework that opens a new direction in domain-gap mitigation by jointly combining domain alignment, resolution preservation, and multi-label reasoning via complementary VLM experts for medical image diagnosis. Specifically, MedBridge transforms pretrained VLMs into multi-view query encoders that inject a compact set of learnable query tokens into intermediate layers, enabling non-destructive domain alignment while preserving fine-grained pathological cues via multi-view high-resolution sampling. These query tokens further act as routing signals for a mixture-of-experts, dynamically integrating heterogeneous foundation models for multi-label reasoning without requiring a shared representation space. We evaluated MedBridge on five chest radiograph benchmarks in three key adaptation tasks. MedBridge demonstrates superior performance in both cross-domain generalization (out-of-distribution transfer) and in-domain specialization (same-distribution tuning) settings, yielding a significant 6-15% AUC improvement over state-of-the-art adaptation methods for multi-label thoracic disease diagnosis. Furthermore, MedBridge is model-agnostic and demonstrates broad extensibility across eight diverse VLMs (e.g., CLIP, LLaVA, Qwen-VL, MedGemma), highlighting its ability to flexibly adapt arbitrary foundation models into a powerful medical diagnostic tool. Our code will be released upon acceptance.

Yitong Li, Morteza Ghahremani, Christian Wachinger• 2025

Related benchmarks

TaskDatasetResultRank
Thoracic Disease ClassificationMIMIC-CXR (test)
Average AUC71.92
34
Multi-label fundus diagnosisODIR-5k
AUC-ROC0.7258
15
Medical Image ClassificationCheXpert Plus (test)
AUC83.55
14
Medical Image ClassificationMIMIC-CXR (test)
AUC71.92
14
Medical Image ClassificationNIH CXR-14 (test)
AUC66.42
14
Medical Image ClassificationRSNA Pneu. (test)
AUC86.26
14
Medical Image ClassificationCOVIDx CXR-4 (test)
AUC76.37
14
Thoracic Disease ClassificationCHEXPERT Plus
AUC79.94
10
Showing 8 of 8 rows

Other info

Follow for update