Gated Differential Linear Attention: A Linear-Time Decoder for High-Fidelity Medical Segmentation

About

Medical image segmentation requires models that preserve fine anatomical boundaries while remaining practical for clinical deployment. Transformers capture long-range dependencies but incur quadratic attention cost, whereas CNNs are efficient but less effective at global reasoning. Linear attention offers \(\mathcal{O}(N)\) scaling, but often produces diffuse feature aggregation that weakens boundary-sensitive prediction. We introduce a gated differential linear-attention mixer for medical image segmentation. Its global path, Gated Differential Linear Attention (GDLA), performs differential subtraction between two kernelized attention branches over complementary query/key subspaces to suppress redundant responses, and employs a data-dependent gate for token refinement. A parallel local token-mixing branch with depthwise convolution strengthens neighborhood interactions for better refinement, and the two branches are fused while preserving \(\mathcal{O}(N)\) complexity. When instantiated in a pretrained Pyramid Vision Transformer (PVT)-based encoder--decoder model, \name achieves state-of-the-art results on the evaluated 2D medical segmentation benchmarks spanning CT, MRI, ultrasound, and dermoscopy, with a favorable accuracy--efficiency trade-off over closely related baselines. The code is publicly available at \href{https://github.com/xmindflow/gdla}{https://github.com/xmindflow/gdla}.

Hongbo Zheng, Afshin Bozorgpour, Dorit Merhof, Minjia Zhang• 2026

Related benchmarks

Task	Dataset	Result
Medical Image Segmentation	BUSI	Dice Score80.54	143
Skin Lesion Segmentation	PH2	DIC0.9559	92
Medical Image Segmentation	Synapse	Average DSC85.32	77
Cardiac Segmentation	ACDC	RV Score91.3	68
Skin Lesion Segmentation	HAM10000	Dice Coefficient95.01	39

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord