Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Improvise, Adapt, Overcome -- Telescopic Adapters for Efficient Fine-tuning of Vision Language Models in Medical Imaging

About

Adapting Vision Language Segmentation Models (VLSMs) to medical imaging domains requires significant computational overhead when using conventional fine-tuning approaches. Existing Parameter-Efficient Fine-Tuning (PEFT) methods apply uniform adapter dimensions across all transformer layers, leading to suboptimal parameter allocation and reduced adaptation efficiency. We introduce Telescopic Adapters, a novel PEFT framework that employs depth-aware scaling to progressively increase adapter capacity from shallow to deep transformer layers. Our method integrates lightweight bottleneck modules within CLIPSeg's vision and text encoders, with adapter dimensions dynamically scaled based on layer depth and semantic relevance. Using only 613k trainable parameters--244x fewer than end-to-end fine-tuning, Telescopic Adapters achieve superior performance across five diverse medical datasets spanning polyp segmentation, skin lesion detection, and breast ultrasound imaging. Comprehensive ablation studies demonstrate that deeper layers require substantially more adaptation capacity than shallow layers, validating our telescopic scaling hypothesis. Our approach establishes a new paradigm for efficient medical VLSM fine-tuning, enabling deployment in resource-constrained clinical environments while maintaining competitive segmentation accuracy.

Ujjwal Mishra, Vinita Shukla, Praful Hambarde, Amit Shukla• 2025

Related benchmarks

TaskDatasetResultRank
Medical Image SegmentationBUSI (test)
Dice65.9
121
Binary SegmentationKvasir-SEG (test)
DSC0.8979
67
Image SegmentationISIC 2016 (test)
Dice Coefficient92.18
40
Semantic segmentationBKAI (test)
DSC88.38
13
Semantic segmentationClinicDB (test)
DSC (%)91.67
13
Showing 5 of 5 rows

Other info

Follow for update