Adapting Foundation Vision-Language Models to Medical Diagnosis via Query-Driven Expert Bridging
About
Vision-language foundation models achieve promising performance in natural image classification, yet their direct application to medical imaging is limited by severe domain shifts, resolution mismatches, and the multi-label nature of clinical diagnosis. Training dedicated medical foundation models from scratch, however, is costly and data-intensive. Here, we propose MedBridge, a lightweight adaptation framework that opens a new direction in domain-gap mitigation by jointly combining domain alignment, resolution preservation, and multi-label reasoning via complementary VLM experts for medical image diagnosis. Specifically, MedBridge transforms pretrained VLMs into multi-view query encoders that inject a compact set of learnable query tokens into intermediate layers, enabling non-destructive domain alignment while preserving fine-grained pathological cues via multi-view high-resolution sampling. These query tokens further act as routing signals for a mixture-of-experts, dynamically integrating heterogeneous foundation models for multi-label reasoning without requiring a shared representation space. We evaluated MedBridge on five chest radiograph benchmarks in three key adaptation tasks. MedBridge demonstrates superior performance in both cross-domain generalization (out-of-distribution transfer) and in-domain specialization (same-distribution tuning) settings, yielding a significant 6-15% AUC improvement over state-of-the-art adaptation methods for multi-label thoracic disease diagnosis. Furthermore, MedBridge is model-agnostic and demonstrates broad extensibility across eight diverse VLMs (e.g., CLIP, LLaVA, Qwen-VL, MedGemma), highlighting its ability to flexibly adapt arbitrary foundation models into a powerful medical diagnostic tool. Our code will be released upon acceptance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Thoracic Disease Classification | MIMIC-CXR (test) | Average AUC71.92 | 34 | |
| Multi-label fundus diagnosis | ODIR-5k | AUC-ROC0.7258 | 15 | |
| Medical Image Classification | CheXpert Plus (test) | AUC83.55 | 14 | |
| Medical Image Classification | MIMIC-CXR (test) | AUC71.92 | 14 | |
| Medical Image Classification | NIH CXR-14 (test) | AUC66.42 | 14 | |
| Medical Image Classification | RSNA Pneu. (test) | AUC86.26 | 14 | |
| Medical Image Classification | COVIDx CXR-4 (test) | AUC76.37 | 14 | |
| Thoracic Disease Classification | CHEXPERT Plus | AUC79.94 | 10 |