Why Invariance is Not Enough for Biomedical Domain Generalization and How to Fix It
About
We present DropGen, a simple and theoretically-grounded approach for domain generalization in 3D biomedical image segmentation. Modern segmentation models degrade sharply under shifts in modality, disease severity, clinical sites, and other factors, creating brittle models that limit reliable deployment. Existing domain generalization methods rely on extreme augmentations, mixing domain statistics, or architectural redesigns, yet incur significant implementation overhead and yield inconsistent performance across biomedical settings. DropGen instead proposes a principled learning strategy with minimal overhead that leverages both source-domain image intensities and domain-stable foundation model representations to train robust segmentation models. As a result, DropGen achieves strong gains in both fully supervised and few-shot segmentation across a broad range of shifts in biomedical studies. Unlike prior approaches, DropGen is architecture- and loss-agnostic, compatible with standard augmentation pipelines, computationally lightweight, and tackles arbitrary anatomical regions. Our implementation is freely available at https://github.com/sebodiaz/DropGen.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Medical Image Segmentation | AMOS (test) | DSC67.5 | 34 | |
| 3D Medical Image Segmentation | MSD-BraTS FLAIR MRI (test) | Mean Dice Score46.2 | 10 | |
| 3D Medical Image Segmentation | TopCoW MR angiography (MRA) (test) | Mean Dice47.8 | 10 | |
| 3D Medical Image Segmentation | Prostate Multi-site collection (test) | Mean Dice84.7 | 10 | |
| Domain Generalization Performance Ranking | Aggregate (AMOS, BraTS, CoW, HVSMR, PanDG, Prostate) | Average Rank1.5 | 10 | |
| 3D Medical Image Segmentation | PanDG Out-of-phase scans (test) | Mean Dice46 | 10 | |
| 3D Medical Image Segmentation | HVSMR (test) | Mean Dice65.8 | 10 | |
| Medical Image Segmentation | AMOS few-shot | Mean Dice Score0.534 | 9 | |
| Medical Image Segmentation | CoW few-shot | Mean Dice Score40.3 | 9 | |
| Medical Image Segmentation | HVSMR few-shot | Mean Dice Score46.3 | 9 |