LRM: Large Reconstruction Model for Single Image to 3D
About
We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds. In contrast to many previous methods that are trained on small-scale datasets such as ShapeNet in a category-specific fashion, LRM adopts a highly scalable transformer-based architecture with 500 million learnable parameters to directly predict a neural radiance field (NeRF) from the input image. We train our model in an end-to-end manner on massive multi-view data containing around 1 million objects, including both synthetic renderings from Objaverse and real captures from MVImgNet. This combination of a high-capacity model and large-scale training data empowers our model to be highly generalizable and produce high-quality 3D reconstructions from various testing inputs, including real-world in-the-wild captures and images created by generative models. Video demos and interactable 3D meshes can be found on our LRM project webpage: https://yiconghong.me/LRM.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Shape Reconstruction | OmniObject3D | CD0.407 | 17 | |
| Image-conditioned 3D Generation | Objaverse (test) | FID38.41 | 10 | |
| 3D Shape Reconstruction | Pix3D | FS@10.1458 | 10 | |
| Image-to-3D Generation | User Study (test) | Multi-view Consistency6.72 | 8 | |
| Image-to-3D Mesh Generation | GSO (test) | PSNR18.0433 | 8 | |
| 3D Shape Reconstruction | Ocrtoc3D (test) | FS@10.1552 | 7 | |
| Single Image to 3D Reconstruction | Google Scanned Objects (GSO) orbiting views | Chamfer Distance0.1479 | 7 | |
| 3D Reconstruction | OmniObject3D | PSNR18.04 | 7 | |
| Single Image to 3D Reconstruction | Google Scanned Objects (GSO) orbiting views | PSNR16.728 | 7 | |
| Collision-free path planning | plants | Path Length3.2 | 6 |