Fuse4Seg: Image Fusion for Multi-Modal Medical Segmentation via Bi-level Optimization

About

Multi-modal medical image fusion is traditionally optimized for human visual perception, aiming to maximize generic contrast and structural fidelity. However, when these visually pleasing fused images are deployed in automated clinical workflows, this visual-semantic discrepancy causes task-agnostic feature degradation, inadvertently smoothing out critical, high-frequency tumor boundaries. To bridge this semantic gap, we propose Fuse4Seg, a novel framework that reformulates multi-modal fusion as a cooperative bi-level optimization problem with medical segmentation. Rather than relying on rigid visual metrics, our fusion leader dynamically updates its feature extraction strategy driven directly by semantic gradients backpropagated from the downstream segmentation follower. To guarantee robust physical fidelity alongside semantic utility, we design a frequency-decoupled architecture stringently regularized by a Frequency Decomposition Loss and a Spatial Gradient Loss. This explicit physical anchor prevents anatomical distortion and ensures the lossless preservation of task-critical details. Extensive experiments demonstrate that our task-aware, single-channel fused prior generalizes seamlessly across diverse multi-scale modalities. More impressively, it remarkably surpasses contemporary dual-channel segmentation state-of-the-arts while explicitly providing a readable, "glass-box" physical image to foster clinical visual interpretability and trust.

Yuchen Guo, Junli Gong, Hongmin Cai, Yiu-ming Cheung, Weifeng Su• 2024

Related benchmarks

Task	Dataset	Result
Segmentation	BraTS 2021	Dice (ET)93.7	28
Medical image fusion	Harvard MRI-SPECT	Entropy (EN)7.522	7
Medical image fusion	Harvard MRI-PET	Entropy (EN)7.45	7

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord