Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance

About

Generating and editing images from open domain text prompts is a challenging task that heretofore has required expensive and specially trained models. We demonstrate a novel methodology for both tasks which is capable of producing images of high visual quality from text prompts of significant semantic complexity without any training by using a multimodal encoder to guide image generations. We demonstrate on a variety of tasks how using CLIP [37] to guide VQGAN [11] produces higher visual quality outputs than prior, less flexible approaches like DALL-E [38], GLIDE [33] and Open-Edit [24], despite not being trained for the tasks presented. Our code is available in a public repository.

Katherine Crowson, Stella Biderman, Daniel Kornis, Dashiell Stander, Eric Hallahan, Louis Castricato, Edward Raff• 2022

Related benchmarks

TaskDatasetResultRank
Longitudinal Brain MRI SynthesisADNI (test)
SSIM0.7463
13
Target (Aircraft) ClassificationBoeing simulated
Precision84.69
10
Azimuth Angle ClassificationBoeing simulated
Precision4.84
10
Depression Angle ClassificationBoeing simulated
Precision0.1424
10
Polarization Mode ClassificationShanxi real-world (test)
Precision75.05
10
Azimuth Angle ClassificationShanxi real-world (test)
Precision1.39
10
Target (Aircraft) ClassificationShanxi real-world (test)
Precision92.16
10
SAR Image GenerationShanxi dataset (test)
PSNR23.87
9
SAR Image GenerationBoeing (test)
PSNR29.7
9
Longitudinal Brain MRI SynthesisBrain MRI 0 ≤ Δt < 12 (test)
SSIM0.7553
7
Showing 10 of 14 rows

Other info

Follow for update