Unpaired Image-to-Image Translation via a Self-Supervised Semantic Bridge
About
Adversarial diffusion and diffusion-inversion methods have advanced unpaired image-to-image translation, but each faces key limitations. Adversarial approaches require target-domain adversarial loss during training, which can limit generalization to unseen data, while diffusion-inversion methods often produce low-fidelity translations due to imperfect inversion into noise-latent representations. In this work, we propose the Self-Supervised Semantic Bridge (SSB), a versatile framework that integrates external semantic priors into diffusion bridge models to enable spatially faithful translation without cross-domain supervision. Our key idea is to leverage self-supervised visual encoders to learn representations that are invariant to appearance changes but capture geometric structure, forming a shared latent space that conditions the diffusion bridges. Extensive experiments show that SSB outperforms strong prior methods for challenging medical image synthesis in both in-domain and out-of-domain settings, and extends easily to high-quality text-guided editing.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image-to-Image Translation | MRI to CT Out-of-domain | FID30.15 | 9 | |
| Image-to-Image Translation | MRI to CT In-domain | MS-SSIM81 | 9 | |
| Image-to-Image Translation | Natural I2I Horse→Zebra Apple→Orange (test) | CLIP-T0.322 | 7 | |
| MRI to CT translation | medical MRI→CT 256 × 256 (test) | NFE150 | 7 | |
| Image-to-Image Translation and Editing | text-guided I2I translation and editing SD3-M | Inference Time (s)14.13 | 6 | |
| Image-to-Image Translation | Natural Image Translation Text-free I2I | Inference Time (s)6.56 | 5 |