MedVAR: Towards Scalable and Efficient Medical Image Generation via Next-scale Autoregressive Prediction
About
Medical image generation is pivotal in applications like data augmentation for low-resource clinical tasks and privacy-preserving data sharing. However, developing a scalable generative backbone for medical imaging requires architectural efficiency, sufficient multi-organ data, and principled evaluation, yet current approaches leave these aspects unresolved. Therefore, we introduce MedVAR, the first autoregressive-based foundation model that adopts the next-scale prediction paradigm to enable fast and scale-up-friendly medical image synthesis. MedVAR generates images in a coarse-to-fine manner and produces structured multi-scale representations suitable for downstream use. To support hierarchical generation, we curate a harmonized dataset of around 440,000 CT and MRI images spanning six anatomical regions. Comprehensive experiments across fidelity, diversity, and scalability show that MedVAR achieves state-of-the-art generative performance and offers a promising architectural direction for future medical generative foundation models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Medical Image Generation | Medical Image Dataset | Efficiency Score11.94 | 25 | |
| Image Generation | Brain MRI | RadFID0.19 | 7 | |
| Medical Image Generation | MRI Medical Imaging (val) | KID (Brain)0.018 | 7 | |
| Medical Image Generation | CT Chest | RadFID0.08 | 6 | |
| Medical Image Generation | CT Medical Imaging (val) | KID (Chest)0.012 | 6 | |
| Medical Image Generation | CT Abdomen | RadFID0.05 | 5 | |
| Medical Image Generation | MRI Abdomen | RadFID0.11 | 5 | |
| Medical Image Generation | CT Heart | RadFID0.46 | 4 | |
| Medical Image Generation | CT Spine | RadFID0.07 | 4 | |
| Medical Image Generation | MRI Heart | RadFID0.25 | 4 |