Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model

About

Feed-forward 3D generative models like the Large Reconstruction Model (LRM) have demonstrated exceptional generation speed. However, the transformer-based methods do not leverage the geometric priors of the triplane component in their architecture, often leading to sub-optimal quality given the limited size of 3D data and slow training. In this work, we present the Convolutional Reconstruction Model (CRM), a high-fidelity feed-forward single image-to-3D generative model. Recognizing the limitations posed by sparse 3D data, we highlight the necessity of integrating geometric priors into network design. CRM builds on the key observation that the visualization of triplane exhibits spatial correspondence of six orthographic images. First, it generates six orthographic view images from a single input image, then feeds these images into a convolutional U-Net, leveraging its strong pixel-level alignment capabilities and significant bandwidth to create a high-resolution triplane. CRM further employs Flexicubes as geometric representation, facilitating direct end-to-end optimization on textured meshes. Overall, our model delivers a high-fidelity textured mesh from an image in just 10 seconds, without any test-time optimization.

Zhengyi Wang, Yikai Wang, Yifei Chen, Chendong Xiang, Shuo Chen, Dajiang Yu, Chongxuan Li, Hang Su, Jun Zhu• 2024

Related benchmarks

TaskDatasetResultRank
3D Shape ReconstructionOmniObject3D
CD0.065
17
Single-view 3D ReconstructionGSO (test)
CD0.161
13
3D Shape ReconstructionGSO
FS0.886
10
Image-conditioned 3D GenerationObjaverse (test)
FID45.53
10
3D Reconstruction RenderingGSO
PSNR15.054
10
Image-to-3D GenerationUser Study (test)
Multi-view Consistency7.95
8
Image-to-3D Mesh GenerationGSO (test)
PSNR18.4407
8
Single-view 3D ReconstructionOmniObject3D
Chamfer Distance (CD)0.155
8
Single Image to 3D ReconstructionGoogle Scanned Objects (GSO) orbiting views
PSNR17.435
7
3D ReconstructionOmniObject3D (1700 unseen samples)
CLIP Score0.672
7
Showing 10 of 17 rows

Other info

Follow for update