Adapting Foundation Vision-Language Models to Medical Diagnosis via Query-Driven Expert Bridging

About

Vision-language foundation models achieve promising performance in natural image classification, yet their direct application to medical imaging is limited by severe domain shifts, resolution mismatches, and the multi-label nature of clinical diagnosis. Training dedicated medical foundation models from scratch, however, is costly and data-intensive. Here, we propose MedBridge, a lightweight adaptation framework that opens a new direction in domain-gap mitigation by jointly combining domain alignment, resolution preservation, and multi-label reasoning via complementary VLM experts for medical image diagnosis. Specifically, MedBridge transforms pretrained VLMs into multi-view query encoders that inject a compact set of learnable query tokens into intermediate layers, enabling non-destructive domain alignment while preserving fine-grained pathological cues via multi-view high-resolution sampling. These query tokens further act as routing signals for a mixture-of-experts, dynamically integrating heterogeneous foundation models for multi-label reasoning without requiring a shared representation space. We evaluated MedBridge on five chest radiograph benchmarks in three key adaptation tasks. MedBridge demonstrates superior performance in both cross-domain generalization (out-of-distribution transfer) and in-domain specialization (same-distribution tuning) settings, yielding a significant 6-15% AUC improvement over state-of-the-art adaptation methods for multi-label thoracic disease diagnosis. Furthermore, MedBridge is model-agnostic and demonstrates broad extensibility across eight diverse VLMs (e.g., CLIP, LLaVA, Qwen-VL, MedGemma), highlighting its ability to flexibly adapt arbitrary foundation models into a powerful medical diagnostic tool. Our code will be released upon acceptance.

Yitong Li, Morteza Ghahremani, Christian Wachinger• 2025

Related benchmarks

Task	Dataset	Result
Thoracic Disease Classification	MIMIC-CXR (test)	Average AUC71.92	34
Multi-label fundus diagnosis	ODIR-5k	AUC-ROC0.7258	15
Medical Image Classification	CheXpert Plus (test)	AUC83.55	14
Medical Image Classification	MIMIC-CXR (test)	AUC71.92	14
Medical Image Classification	NIH CXR-14 (test)	AUC66.42	14
Medical Image Classification	RSNA Pneu. (test)	AUC86.26	14
Medical Image Classification	COVIDx CXR-4 (test)	AUC76.37	14
Thoracic Disease Classification	CHEXPERT Plus	AUC79.94	10

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord