Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

About

Recently, 3D reconstruction and generation have demonstrated impressive novel view synthesis results, achieving high fidelity and efficiency. However, a notable conditioning gap can be observed between these two fields, e.g., scalable 3D scene reconstruction often requires densely captured views, whereas 3D generation typically relies on a single or no input view, which significantly limits their applications. We found that the source of this phenomenon lies in the misalignment between 3D constraints and generative priors. To address this problem, we propose a reconstruction-driven video diffusion model that learns to condition video frames on artifact-prone RGB-D renderings. Moreover, we propose a cyclical fusion pipeline that iteratively adds restoration frames from the generative model to the training set, enabling progressive expansion and addressing the viewpoint saturation limitations seen in previous reconstruction and generation pipelines. Our evaluation, including view synthesis from sparse view and masked input, validates the effectiveness of our approach. More details at https://genfusion.sibowu.com.

Sibo Wu, Congrong Xu, Binbin Huang, Andreas Geiger, Anpei Chen• 2025

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisRE10K
SSIM81.8
142
Novel View SynthesisRe10K (test)
PSNR20.618
79
Novel View SynthesisDL3DV (test)
PSNR18.363
61
Novel View SynthesisT&T small-viewpoint set (O)
PSNR20.14
44
Novel View SynthesisRE10K Small
PSNR14.87
38
New View SynthesisT&T
LPIPS0.278
33
New View SynthesisLLFF (R)
SSIM0.864
32
Novel View SynthesisDL3DV S
LPIPS0.547
25
New View SynthesisDTU (R)
SSIM55.4
24
Novel View SynthesisDTU small-viewpoint set (R)
PSNR17.87
24
Showing 10 of 57 rows

Other info

Follow for update