Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LRM: Large Reconstruction Model for Single Image to 3D

About

We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds. In contrast to many previous methods that are trained on small-scale datasets such as ShapeNet in a category-specific fashion, LRM adopts a highly scalable transformer-based architecture with 500 million learnable parameters to directly predict a neural radiance field (NeRF) from the input image. We train our model in an end-to-end manner on massive multi-view data containing around 1 million objects, including both synthetic renderings from Objaverse and real captures from MVImgNet. This combination of a high-capacity model and large-scale training data empowers our model to be highly generalizable and produce high-quality 3D reconstructions from various testing inputs, including real-world in-the-wild captures and images created by generative models. Video demos and interactable 3D meshes can be found on our LRM project webpage: https://yiconghong.me/LRM.

Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, Hao Tan• 2023

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisGoogle Scanned Objects (GSO) (test)
PSNR14.526
24
3D Shape ReconstructionOmniObject3D
CD0.407
17
Image-conditioned 3D GenerationObjaverse (test)
FID38.41
10
3D Shape ReconstructionPix3D
FS@10.1458
10
Image-to-3D GenerationUser Study (test)
Multi-view Consistency6.72
8
Image-to-3D Mesh GenerationGSO (test)
PSNR18.0433
8
3D Shape ReconstructionOcrtoc3D (test)
FS@10.1552
7
Single Image to 3D ReconstructionGoogle Scanned Objects (GSO) orbiting views
Chamfer Distance0.1479
7
3D ReconstructionOmniObject3D
PSNR18.04
7
Single Image to 3D ReconstructionGoogle Scanned Objects (GSO) orbiting views
PSNR16.728
7
Showing 10 of 19 rows

Other info

Follow for update