Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

About

This paper introduces MVDiffusion, a simple yet effective method for generating consistent multi-view images from text prompts given pixel-to-pixel correspondences (e.g., perspective crops from a panorama or multi-view images given depth maps and poses). Unlike prior methods that rely on iterative image warping and inpainting, MVDiffusion simultaneously generates all images with a global awareness, effectively addressing the prevalent error accumulation issue. At its core, MVDiffusion processes perspective images in parallel with a pre-trained text-to-image diffusion model, while integrating novel correspondence-aware attention layers to facilitate cross-view interactions. For panorama generation, while only trained with 10k panoramas, MVDiffusion is able to generate high-resolution photorealistic images for arbitrary texts or extrapolate one perspective image to a 360-degree view. For multi-view depth-to-image generation, MVDiffusion demonstrates state-of-the-art performance for texturing a scene mesh.

Shitao Tang, Fuyang Zhang, Jiacheng Chen, Peng Wang, Yasutaka Furukawa• 2023

Related benchmarks

TaskDatasetResultRank
Texture Synthesis3D-Front (test)
CLIP Score18.47
7
Text-to-Panorama GenerationPEBench (test)
FID96.07
7
Panorama GenerationMatterport3D (test)
FID21.44
5
Multi-view depth-to-image generationScanNet (test)
FID23.1
3
Showing 4 of 4 rows

Other info

Code

Follow for update