Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

In-Context Learning Unlocked for Diffusion Models

About

We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models. Given a pair of task-specific example images, such as depth from/to image and scribble from/to image, and a text guidance, our model automatically understands the underlying task and performs the same task on a new query image following the text guidance. To achieve this, we propose a vision-language prompt that can model a wide range of vision-language tasks and a diffusion model that takes it as input. The diffusion model is trained jointly over six different tasks using these prompts. The resulting Prompt Diffusion model is the first diffusion-based vision-language foundation model capable of in-context learning. It demonstrates high-quality in-context generation on the trained tasks and generalizes effectively to new, unseen vision tasks with their respective prompts. Our model also shows compelling text-guided image editing results. Our framework aims to facilitate research into in-context learning for computer vision. We share our code and pre-trained models at https://github.com/Zhendong-Wang/Prompt-Diffusion.

Zhendong Wang, Yifan Jiang, Yadong Lu, Yelong Shen, Pengcheng He, Weizhu Chen, Zhangyang Wang, Mingyuan Zhou• 2023

Related benchmarks

TaskDatasetResultRank
Controllable Image GenerationCOCO (test)
Inference Latency (s)9.63
14
Image ManipulationImage manipulation Few-shot (In Distribution)
CLIP-Dir17.13
7
Image ManipulationFew-shot image manipulation (Out of Distribution)
CLIP Directional Score15.41
6
Conditional Image Generation (HED Edge)COCO 5,000 samples 2017 (val)
FID59.4
6
Depth EstimationVisual In-Context Learning (V-ICL) Benchmark
AbsRel0.16
5
Edge DetectionVisual In-Context Learning (V-ICL) Benchmark
RMSE35.88
5
ColorizationVisual In-Context Learning (V-ICL) Benchmark
FID179.2
5
Object DetectionPASCAL-5i
mIoU32.6
5
Surface Normal EstimationVisual In-Context Learning (V-ICL) Benchmark
Median Angular Error97.27
5
Image DerainingVisual In-Context Learning (V-ICL) Benchmark
PSNR8.67
5
Showing 10 of 28 rows

Other info

Code

Follow for update