Unpaired Image-to-Image Translation via a Self-Supervised Semantic Bridge

About

Adversarial diffusion and diffusion-inversion methods have advanced unpaired image-to-image translation, but each faces key limitations. Adversarial approaches require target-domain adversarial loss during training, which can limit generalization to unseen data, while diffusion-inversion methods often produce low-fidelity translations due to imperfect inversion into noise-latent representations. In this work, we propose the Self-Supervised Semantic Bridge (SSB), a versatile framework that integrates external semantic priors into diffusion bridge models to enable spatially faithful translation without cross-domain supervision. Our key idea is to leverage self-supervised visual encoders to learn representations that are invariant to appearance changes but capture geometric structure, forming a shared latent space that conditions the diffusion bridges. Extensive experiments show that SSB outperforms strong prior methods for challenging medical image synthesis in both in-domain and out-of-domain settings, and extends easily to high-quality text-guided editing.

Jiaming Liu, Felix Petersen, Yunhe Gao, Yabin Zhang, Hyojin Kim, Akshay S. Chaudhari, Yu Sun, Stefano Ermon, Sergios Gatidis• 2026

Related benchmarks

Task	Dataset	Result
Image-to-Image Translation	MRI to CT Out-of-domain	FID30.15	9
Image-to-Image Translation	MRI to CT In-domain	MS-SSIM81	9
Image-to-Image Translation	Natural I2I Horse→Zebra Apple→Orange (test)	CLIP-T0.322	7
MRI to CT translation	medical MRI→CT 256 × 256 (test)	NFE150	7
Image-to-Image Translation and Editing	text-guided I2I translation and editing SD3-M	Inference Time (s)14.13	6
Image-to-Image Translation	Natural Image Translation Text-free I2I	Inference Time (s)6.56	5

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord