Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows

About

We introduce the Large Sparse Reconstruction Model to study how scaling transformer context windows impacts feed-forward 3D reconstruction. Although recent object-centric feed-forward methods deliver robust, high-quality reconstruction, they still lag behind dense-view optimization in recovering fine-grained texture and appearance. We show that expanding the context window -- by substantially increasing the number of active object and image tokens -- remarkably narrows this gap and enables high-fidelity 3D object reconstruction and inverse rendering. To scale effectively, we adapt native sparse attention in our architecture design, unlocking its capacity for 3D reconstruction with three key contributions: (1) an efficient coarse-to-fine pipeline that focuses computation on informative regions by predicting sparse high-resolution residuals; (2) a 3D-aware spatial routing mechanism that establishes accurate 2D-3D correspondences using explicit geometric distances rather than standard attention scores; and (3) a custom block-aware sequence parallelism strategy utilizing an All-gather-KV protocol to balance dynamic, sparse workloads across GPUs. As a result, LSRM handles 20x more object tokens and >2x more image tokens than prior state-of-the-art (SOTA) methods. Extensive evaluations on standard novel-view synthesis benchmarks show substantial gains over the current SOTA, yielding 2.5 dB higher PSNR and 40% lower LPIPS. Furthermore, when extending LSRM to inverse rendering tasks, qualitative and quantitative evaluations on widely-used benchmarks demonstrate consistent improvements in texture and geometry details, achieving an LPIPS that matches or exceeds that of SOTA dense-view optimization methods. Code and model will be released on our project page.

Zhengqin Li, Cheng Zhang, Jakob Engel, Zhao Dong• 2026

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisGSO
PSNR33.08
25
Inverse RenderingStanfordORB
PSNR (High Freq)25.47
5
Inverse RenderingDigitalTwinCatalogue
PSNR-H29.67
4
Inverse RenderingObjectsWithLighting
PSNR24.88
4
Showing 4 of 4 rows

Other info

Follow for update