Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EscherNet: A Generative Model for Scalable View Synthesis

About

We introduce EscherNet, a multi-view conditioned diffusion model for view synthesis. EscherNet learns implicit and generative 3D representations coupled with a specialised camera positional encoding, allowing precise and continuous relative control of the camera transformation between an arbitrary number of reference and target views. EscherNet offers exceptional generality, flexibility, and scalability in view synthesis -- it can generate more than 100 consistent target views simultaneously on a single consumer-grade GPU, despite being trained with a fixed number of 3 reference views to 3 target views. As a result, EscherNet not only addresses zero-shot novel view synthesis, but also naturally unifies single- and multi-image 3D reconstruction, combining these diverse tasks into a single, cohesive framework. Our extensive experiments demonstrate that EscherNet achieves state-of-the-art performance in multiple benchmarks, even when compared to methods specifically tailored for each individual problem. This remarkable versatility opens up new directions for designing scalable neural architectures for 3D vision. Project page: https://kxhit.github.io/EscherNet.

Xin Kong, Shikun Liu, Xiaoyang Lyu, Marwan Taher, Xiaojuan Qi, Andrew J. Davison• 2024

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisGSO
PSNR26.3
25
Novel View SynthesisDL3DV 6view
PSNR12.07
25
Novel View SynthesisGoogle Scanned Objects
PSNR24.09
15
Novel View SynthesisObjaverse
PSNR13.44
12
3D Object ReconstructionGSO-30
Chamfer Distance (×10^-3)0.0175
11
Novel View SynthesisGSO-30
PSNR25.09
11
Novel View SynthesisOmniObject3D
PSNR25.3
10
Novel View SynthesisObjaverse-LVIS (test)
Score3.2
7
6-view Novel View SynthesisMip-NeRF 360
PSNR11.14
7
Novel View SynthesisGSO30 (test)
FID14.7044
6
Showing 10 of 13 rows

Other info

Code

Follow for update