T-Gated Adapter: A Lightweight Temporal Adapter for Vision-Language Medical Segmentation

About

Medical image segmentation traditionally relies on fully supervised 3D architectures that demand a large amount of dense, voxel-level annotations from clinical experts which is a prohibitively expensive process. Vision Language Models (VLMs) offer a powerful alternative by leveraging broad visual semantic representations learned from billions of images. However, when applied independently to 2D slices of a 3D scan, these models often produce noisy and anatomically implausible segmentations that violate the inherent continuity of anatomical structures. We propose a temporal adapter that addresses this by injecting adjacent-slice context directly into the model's visual token representations. The adapter comprises a temporal transformer attending across a fixed context window at the token level, a spatial context block refining within-slice representations, and an adaptive gate balancing temporal and single-slice features. Training on 30 labeled volumes from the FLARE22 dataset, our method achieves a mean Dice of 0.704 across 13 abdominal organs with a gain of +0.206 over the baseline VLM trained with no temporal context. Zero-shot evaluation on BTCV and AMOS22 datasets yields consistent improvements of +0.210 and +0.230, with the average cross-domain performance drop reducing from 38.0% to 24.9%. Furthermore, in a cross-modality evaluation on AMOS22 MRI with neither model receiving any MRI supervision, our method achieves a mean Dice of 0.366, outperforming a fully supervised 3D baseline (DynUNet, 0.224) trained exclusively on CT, suggesting that CLIP's visual semantic representations generalize more gracefully across imaging modalities than convolutional features.

Pranjal Khadka• 2026

Related benchmarks

Task	Dataset	Result
Abdominal Organ Segmentation	FLARE 22 (test)	Mean Dice70.4	16
Multi-organ abdominal segmentation	AMOS CT 22	Avg Dice Score51.3	5
Abdominal Organ Segmentation	BTCV	Mean Dice54.4	2
Segmentation	AMOS MRI 22 (10 volumes)	Dice Score36.6	2

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord