WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image Synthesis
About
Due to the three-dimensional nature of CT- or MR-scans, generative modeling of medical images is a particularly challenging task. Existing approaches mostly apply patch-wise, slice-wise, or cascaded generation techniques to fit the high-dimensional data into the limited GPU memory. However, these approaches may introduce artifacts and potentially restrict the model's applicability for certain downstream tasks. This work presents WDM, a wavelet-based medical image synthesis framework that applies a diffusion model on wavelet decomposed images. The presented approach is a simple yet effective way of scaling 3D diffusion models to high resolutions and can be trained on a single \SI{40}{\giga\byte} GPU. Experimental results on BraTS and LIDC-IDRI unconditional image generation at a resolution of $128 \times 128 \times 128$ demonstrate state-of-the-art image fidelity (FID) and sample diversity (MS-SSIM) scores compared to recent GANs, Diffusion Models, and Latent Diffusion Models. Our proposed method is the only one capable of generating high-quality images at a resolution of $256 \times 256 \times 256$, outperforming all comparing methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Medical Image Synthesis | 3D MRI (test) | FID0.3073 | 36 | |
| Medical Image Synthesis | VoCo 10k (train/test) | FID0.9668 | 16 | |
| Brain Age Prediction | Brain Age ≥ 44 (test) | Absolute Error6.36 | 15 | |
| Brain Age Prediction | Brain Age Age ≥ 44 (train) | Absolute Error1.63 | 15 | |
| Region-Based Anatomical Plausibility | Brain MRIs 95 Regions of Interest (test) | iMAE47.52 | 11 | |
| Image Generation | Brain MRI | RadFID0.39 | 7 | |
| Medical Image Generation | MRI Medical Imaging (val) | KID (Brain)0.138 | 7 | |
| Medical Image Generation | CT Chest | RadFID0.92 | 6 | |
| Medical Image Generation | CT Medical Imaging (val) | KID (Chest)0.227 | 6 |