KNN-Diffusion: Image Generation via Large-Scale Retrieval

About

Recent text-to-image models have achieved impressive results. However, since they require large-scale datasets of text-image pairs, it is impractical to train them on new domains where data is scarce or not labeled. In this work, we propose using large-scale retrieval methods, in particular, efficient k-Nearest-Neighbors (kNN), which offers novel capabilities: (1) training a substantially small and efficient text-to-image diffusion model without any text, (2) generating out-of-distribution images by simply swapping the retrieval database at inference time, and (3) performing text-driven local semantic manipulations while preserving object identity. To demonstrate the robustness of our method, we apply our kNN approach on two state-of-the-art diffusion backbones, and show results on several different datasets. As evaluated by human studies and automatic metrics, our method achieves state-of-the-art results compared to existing approaches that train text-to-image generation models using images only (without paired text data)

Shelly Sheynin, Oron Ashual, Adam Polyak, Uriel Singer, Oran Gafni, Eliya Nachmani, Yaniv Taigman• 2022

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	MS-COCO (val)	FID16.66	202
Grounded Text-to-Image Generation	COCO 2014 (val)	FID16.66	26
Text-to-Image Synthesis	CUB (test)	FID42.9	16
Text-to-Image Generation	MS-COCO 30K prompts (val)	FID16.66	14
Sticker Generation	Stickers dataset (3,000)	Image Quality Score76	6
Text-to-Image Generation	LN-COCO (test)	FID35.6	4
Text-to-Image Synthesis	MS-COCO (test)	FID12.5	4

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord