SynerMedGen: Synergizing Medical Multimodal Understanding with Generation via Task Alignment
About
Unifying multimodal understanding and generation is a compelling frontier that is beginning to emerge in the medical field. However, the limited existing unified medical models typically treat understanding and generation as disjoint objectives, lacking a meaningful functional synergy. In this work, we identify and address a critical question in unified medical modeling: what form of understanding truly benefits generation. We present SynerMedGen, a unified framework built on the proposed principle of generation-aligned understanding, which synergizes understanding objectives with generation tasks via task alignment. SynerMedGen introduces three generation-aligned understanding tasks and a two-stage training strategy that transfers generation-beneficial representations learned during understanding training to medical image synthesis. Remarkably, even with understanding training alone, our SynerMedGen achieves strong zero-shot performance across 22 medical image synthesis tasks and demonstrates robust generalization to unseen datasets. When combined with generation training, SynerMedGen consistently outperforms state-of-the-art specialized medical image synthesis models as well as recent unified medical models. We also release a large-scale dataset named SynerMed consisting of 1M paired synthesis samples and 2M generation-derived understanding instances to support further research on understanding-generation synergy. Our project can be accessed at https://github.com/Mhilab/SynerMedGen.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Medical Image Synthesis | BraTS | SSIM92.45 | 108 | |
| CBCT to CT Image Synthesis | SynthRAD Brain 2023 | MAE20.21 | 12 | |
| CBCT to CT Image Synthesis | SynthRAD Pelvis 2023 | MAE19.31 | 12 | |
| CT to CBCT Image Synthesis | SynthRAD2023 Brain | MAE34.29 | 12 | |
| CT to MRI Image Synthesis | SynthRAD Brain 2023 | MAE32.71 | 12 | |
| CT to PET Image Synthesis | AutoPET Whole-Body | MAE1.57 | 12 | |
| Image Synthesis | BraTS | Error T1->T24.51 | 12 | |
| Image Synthesis | SynthRAD Brain, CBCT to CT 2023 | PSNR35.03 | 12 | |
| Image Synthesis | SynthRAD2023 Brain, CT to CBCT | PSNR26.97 | 12 | |
| Image Synthesis | SynthRAD Pelvis CBCT to CT 2023 | PSNR34.51 | 12 |