PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

About

We introduce PixARMesh, a method to autoregressively reconstruct complete 3D indoor scene meshes directly from a single RGB image. Unlike prior methods that rely on implicit signed distance fields and post-hoc layout optimization, PixARMesh jointly predicts object layout and geometry within a unified model, producing coherent and artist-ready meshes in a single forward pass. Building on recent advances in mesh generative models, we augment a point-cloud encoder with pixel-aligned image features and global scene context via cross-attention, enabling accurate spatial reasoning from a single image. Scenes are generated autoregressively from a unified token stream containing context, pose, and mesh, yielding compact meshes with high-fidelity geometry. Experiments on synthetic and real-world datasets show that PixARMesh achieves state-of-the-art reconstruction quality while producing lightweight, high-quality meshes ready for downstream applications.

Xiang Zhang, Sohyun Yoo, Hongrui Wu, Chuan Li, Jianwen Xie, Zhuowen Tu• 2026

Related benchmarks

Task	Dataset	Result
3D Scene Reconstruction	ScanNet Matterport3D Pix3D	Runtime (s)4.5	9
3D Scene Reconstruction	3D-FRONT	F Value7.51e+3	9
Scene Reconstruction	3D-FRONT	CD0.0984	8
Object Pose Accuracy	3D-FRONT	Box IoU70.37	7
Object Reconstruction	3D-FRONT	Chamfer Distance (CD)0.004	7

Showing 5 of 5 rows

Other info

GitHub

Follow for update

@wizwand_team Discord