Fuse4Seg: Image Fusion for Multi-Modal Medical Segmentation via Bi-level Optimization
About
Multi-modal medical image fusion is traditionally optimized for human visual perception, aiming to maximize generic contrast and structural fidelity. However, when these visually pleasing fused images are deployed in automated clinical workflows, this visual-semantic discrepancy causes task-agnostic feature degradation, inadvertently smoothing out critical, high-frequency tumor boundaries. To bridge this semantic gap, we propose Fuse4Seg, a novel framework that reformulates multi-modal fusion as a cooperative bi-level optimization problem with medical segmentation. Rather than relying on rigid visual metrics, our fusion leader dynamically updates its feature extraction strategy driven directly by semantic gradients backpropagated from the downstream segmentation follower. To guarantee robust physical fidelity alongside semantic utility, we design a frequency-decoupled architecture stringently regularized by a Frequency Decomposition Loss and a Spatial Gradient Loss. This explicit physical anchor prevents anatomical distortion and ensures the lossless preservation of task-critical details. Extensive experiments demonstrate that our task-aware, single-channel fused prior generalizes seamlessly across diverse multi-scale modalities. More impressively, it remarkably surpasses contemporary dual-channel segmentation state-of-the-arts while explicitly providing a readable, "glass-box" physical image to foster clinical visual interpretability and trust.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Segmentation | BraTS 2021 | Dice (ET)93.7 | 18 | |
| Medical image fusion | Harvard MRI-SPECT | Entropy (EN)7.522 | 7 | |
| Medical image fusion | Harvard MRI-PET | Entropy (EN)7.45 | 7 |