UniQueR: Unified Query-based Feedforward 3D Reconstruction
About
We present UniQueR, a unified query-based feedforward framework for efficient and accurate 3D reconstruction from unposed images. Existing feedforward models such as DUSt3R, VGGT, and AnySplat typically predict per-pixel point maps or pixel-aligned Gaussians, which remain fundamentally 2.5D and limited to visible surfaces. In contrast, UniQueR formulates reconstruction as a sparse 3D query inference problem. Our model learns a compact set of 3D anchor points that act as explicit geometric queries, enabling the network to infer scene structure, including geometry in occluded regions--in a single forward pass. Each query encodes spatial and appearance priors directly in global 3D space (instead of per-frame camera space) and spawns a set of 3D Gaussians for differentiable rendering. By leveraging unified query interactions across multi-view features and a decoupled cross-attention design, UniQueR achieves strong geometric expressiveness while substantially reducing memory and computational cost. Experiments on Mip-NeRF 360 and VR-NeRF demonstrate that UniQueR surpasses state-of-the-art feedforward methods in both rendering quality and geometric accuracy, using an order of magnitude fewer primitives than dense alternatives.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Camera pose estimation | RealEstate10K | AUC@3083.69 | 26 | |
| Novel View Synthesis | MipNeRF360 32 Views | PSNR25.26 | 8 | |
| Novel View Synthesis | MipNeRF360 64 Views | PSNR26 | 8 | |
| Novel View Synthesis | VR-NeRF 32 Views | PSNR27.03 | 8 | |
| Novel View Synthesis | VR-NeRF (64 Views) | PSNR28.56 | 8 | |
| Novel View Synthesis | Mip-NeRF (test) | PSNR22.7 | 6 | |
| Novel View Synthesis | VR-NeRF (test) | PSNR21.99 | 6 |