Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model

About

We introduce MVGenMaster, a multi-view diffusion model enhanced with 3D priors to address versatile Novel View Synthesis (NVS) tasks. MVGenMaster leverages 3D priors that are warped using metric depth and camera poses, significantly enhancing both generalization and 3D consistency in NVS. Our model features a simple yet effective pipeline that can generate up to 100 novel views conditioned on variable reference views and camera poses with a single forward process. Additionally, we have developed a comprehensive large-scale multi-view image dataset called MvD-1M, comprising up to 1.6 million scenes, equipped with well-aligned metric depth to train MVGenMaster. Moreover, we present several training and model modifications to strengthen the model with scaled-up datasets. Extensive evaluations across in- and out-of-domain benchmarks demonstrate the effectiveness of our proposed method and data formulation. Models and codes will be released at https://github.com/ewrfcas/MVGenMaster/.

Chenjie Cao, Chaohui Yu, Shang Liu, Fan Wang, Xiangyang Xue, Yanwei Fu• 2024

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisMip-NeRF360
PSNR15.543
138
Novel View SynthesisDTU
PSNR15.856
115
Novel View SynthesisTanks&Temples
PSNR14.79
95
Novel View SynthesisMip-NeRF 360 out-of-domain 3
PSNR14.17
8
Novel View SynthesisDL3DV 27 (test)
PSNR14.565
8
Novel View SynthesisRealEstate10K 58 (test)
PSNR15.226
8
Novel View SynthesisCO3D+MVImgNet Ordered (test)
PSNR18.964
4
Novel View SynthesisCO3D+MVImgNet Unordered (test)
PSNR21.466
4
Novel View SynthesisDL3DV+Real10k Ordered (test)
PSNR16.177
4
Novel View SynthesisDL3DV+Real10k Unordered (test)
PSNR18.296
4
Showing 10 of 16 rows

Other info

Code

Follow for update