One-Step Image Translation with Text-to-Image Models
About
In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning. To tackle these issues, we introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. Specifically, we consolidate various modules of the vanilla latent diffusion model into a single end-to-end generator network with small trainable weights, enhancing its ability to preserve the input image structure while reducing overfitting. We demonstrate that, for unpaired settings, our model CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods for various scene translation tasks, such as day-to-night conversion and adding/removing weather effects like fog, snow, and rain. We extend our method to paired settings, where our model pix2pix-Turbo is on par with recent works like Control-Net for Sketch2Photo and Edge2Image, but with a single-step inference. This work suggests that single-step diffusion models can serve as strong backbones for a range of GAN learning objectives. Our code and models are available at https://github.com/GaParmar/img2img-turbo.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Explainability Stability Analysis | Instruction-based Image Editing Stability Evaluation 10 prompts, 30 perturbations | Jaccard Index85 | 6 | |
| Instruction-based Image Editing Consistency | Transform the weather to make it snowing prompt 1000 iterations (30 perturbations) | Variance1.00e-4 | 3 | |
| Fidelity Analysis | gSMILE Fidelity Analysis Prompt: 'Transform the weather to make it snowing' (test) | WMSE0.0193 | 3 |