SAM 3D: 3Dfy Anything in Images

About

We present SAM 3D, a generative model for visually grounded 3D object reconstruction, predicting geometry, texture, and layout from a single image. SAM 3D excels in natural images, where occlusion and scene clutter are common and visual recognition cues from context play a larger role. We achieve this with a human- and model-in-the-loop pipeline for annotating object shape, texture, and pose, providing visually grounded 3D reconstruction data at unprecedented scale. We learn from this data in a modern, multi-stage training framework that combines synthetic pretraining with real-world alignment, breaking the 3D "data barrier". We obtain significant gains over recent work, with at least a 5:1 win rate in human preference tests on real-world objects and scenes. We will release our code and model weights, an online demo, and a new challenging benchmark for in-the-wild 3D object reconstruction.

SAM 3D Team, Xingyu Chen, Fu-Jen Chu, Pierre Gleize, Kevin J Liang, Alexander Sax, Hao Tang, Weiyao Wang, Michelle Guo, Thibaut Hardin, Xiang Li, Aohan Lin, Jiawei Liu, Ziqi Ma, Anushka Sagar, Bowen Song, Xiaodong Wang, Jianing Yang, Bowen Zhang, Piotr Doll\'ar, Georgia Gkioxari, Matt Feiszli, Jitendra Malik• 2025

Related benchmarks

Task	Dataset	Result
3D Scene Generation	3D-Front (test)	CD (Surface)0.052	28
3D Reconstruction	GSO	CD Mean0.094	27
3D Asset Reconstruction	Toys4k	CD0.0354	18
Amodal 3D object generation	GSO	FID34.68	14
Task 3	LEGS	Success Rate3	12
Pose Estimation	Simulation	3D IoU46	12
Task 1	LEGS	Success Rate70	12
Task 2	LEGS	Success Rate5	12
Single-object generation	Toy4K	PSNR22.42	11
Novel View Synthesis	GSO-30	PSNR19.82	11

Showing 10 of 55 rows

Other info

GitHub

Follow for update

@wizwand_team Discord