Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Rethinking Image-to-3D Generation with Sparse Queries: Efficiency, Capacity, and Input-View Bias

About

We present SparseGen, a novel framework for efficient image-to-3D generation, which exhibits low input-view bias while being significantly faster. Unlike traditional approaches that rely on dense volumetric grids, triplanes, or pixel-aligned primitives, we model scenes with a compact sparse set of learned 3D anchor queries and a learned expansion operator that decodes each transformed query into a small local set of 3D Gaussian primitives. Trained under a rectified-flow reconstruction objective without 3D supervision, our model learns to allocate representation capacity where geometry and appearance matter, achieving significant reductions in memory and inference time while preserving multi-view fidelity. We introduce quantitative measures of input-view bias and utilization to show that sparse queries reduce overfitting to conditioning views while being representationally efficient. Our results argue that sparse set-latent expansion is a principled, practical alternative for efficient 3D generative modeling.

Zhiyuan Xu, Jiuming Liu, Yuxin Chen, Masayoshi Tomizuka, Chenfeng Xu, Chensheng Peng• 2026

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisGoogle Scanned Objects (GSO) (test)
PSNR21.427
24
Single-view 3D ReconstructionSRN Cars (test)
PSNR24.018
7
Single-view ReconstructionCO3D Hydrant (held-out target view)
PSNR20.366
2
Single-view ReconstructionCO3D Teddybear (held-out target view)
PSNR19.005
2
Showing 4 of 4 rows

Other info

Follow for update