MeDUET: Disentangled Unified Pretraining for 3D Medical Image Synthesis and Analysis

About

Self-supervised learning (SSL) and diffusion models have respectively advanced representation learning and generative modeling for high-dimensional 3D visual data, yet they are often developed as separate paradigms. Their unification remains challenging under multi-source heterogeneity, as anatomical content must be preserved for analysis while acquisition-related style varies across centers and affects synthesis. In this paper, we propose MeDUET, a 3D Medical image Disentangled UnifiEd PreTraining framework in the variational autoencoder latent space. MeDUET formulates unified pretraining as an empirical factor identifiability problem, aiming to learn domain-invariant content factors for anatomy and domain-specific style factors for appearance. To improve factor separation, MeDUET first uses token demixing with a standard adversarial domain regularizer to establish basic content-style specialization, and further introduces Mixed Factor Token Distillation and Swap-invariance Quadruplet Contrast to reduce mixed-region factor leakage and organize factor spaces with factor-wise invariance and discriminability. With these learned factors, MeDUET transfers effectively to both synthesis and analysis, yielding higher fidelity, faster convergence, and better controllability for synthesis, while achieving competitive or superior domain generalization and label efficiency on diverse datasets, tasks, and modalities. Overall, MeDUET shows that multi-source heterogeneity can serve as useful supervision, with disentanglement providing an effective interface for unifying 3D medical image synthesis and analysis. Our code is available at https://github.com/JK-Liu7/MeDUET.

Junkai Liu, Ling Shao, Le Zhang• 2026

Related benchmarks

Task	Dataset	Result
Medical Image Synthesis	VoCo 10k (train/test)	FID0.7874	16
Segmentation	BTCV	1-shot Score78.72	13
Segmentation	AMOS	1-shot Score65.18	13
Segmentation	WORD	1-shot Acc79.56	13
Segmentation	BraTS 21	Performance (1-shot)58.05	13
Classification	CC-CCII (10% train ratio)	Accuracy88.68	10
Classification	CC-CCII 100% ratio (train)	Accuracy93.59	10
Classification	CC-CCII Average across ratios	Accuracy91.35	10
Classification	CC-CCII 50% ratio (train)	Accuracy91.79	10

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord