Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

About

Text-to-image diffusion models have demonstrated remarkable capabilities in transforming textual prompts into coherent images, yet the computational cost of their inference remains a persistent challenge. To address this issue, we present UFOGen, a novel generative model designed for ultra-fast, one-step text-to-image synthesis. In contrast to conventional approaches that focus on improving samplers or employing distillation techniques for diffusion models, UFOGen adopts a hybrid methodology, integrating diffusion models with a GAN objective. Leveraging a newly introduced diffusion-GAN objective and initialization with pre-trained diffusion models, UFOGen excels in efficiently generating high-quality images conditioned on textual descriptions in a single step. Beyond traditional text-to-image generation, UFOGen showcases versatility in applications. Notably, UFOGen stands among the pioneering models enabling one-step text-to-image generation and diverse downstream tasks, presenting a significant advancement in the landscape of efficient generative models.

Yanwu Xu, Yang Zhao, Zhisheng Xiao, Tingbo Hou• 2023

Related benchmarks

TaskDatasetResultRank
Video GenerationUCF-101 (test)--
105
Text-to-Image GenerationCOCO 30k subset 2014 (val)
FID12.78
46
Text-to-Image GenerationMS COCO zero-shot
FID12.78
42
Text-to-Image SynthesisMSCOCO
FID12.78
31
Text-to-Image GenerationMS-COCO 512x512 zero-shot
FID12.78
19
Text-to-Image GenerationMSCOCO 2017 (5k)
FID (5k)22.5
9
Showing 6 of 6 rows

Other info

Follow for update