Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

D\'ej\`a View: Looping Transformers for Multi-View 3D Reconstruction

About

Recent feed-forward 3D reconstruction transformers have scaled to over a billion parameters, following the broader trend of increasing model capacity in computer vision. Yet emerging evidence suggests that contiguous transformer layers often behave like repeated applications of similar operations, and multi-view reconstruction transformers refine their predictions progressively across decoder depth. We posit that model depth partially buys iteration, paid for inefficiently in unique parameters, and instead make that iteration explicit in architecture. Our model, D\'ej\`aView, applies a single looped transformer block recurrently to per-view features for K refinement steps. Trained once, it exposes K as an inference-time compute knob, matching or outperforming substantially larger feed-forward baselines across five reconstruction benchmarks spanning indoor, outdoor, object-centric, and driving scenes, while using a fraction of their parameters and comparable or lower compute. Importantly, the same looped block formulation outperforms an otherwise identical variant with independent per-step parameters under matched training data and compute, suggesting that explicit iteration is not merely a compute-efficient substitute for capacity but a stronger inductive bias for multi-view 3D reconstruction.

Alessandro Burzio, Tobias Fischer, Sven Elflein, Qunjie Zhou, Riccardo de Lutio, Jiawei Ren, Jiahui Huang, Shengyu Huang, Marc Pollefeys, Laura Leal-Taix\'e, Zan Gojcic, Haithem Turki• 2026

Related benchmarks

TaskDatasetResultRank
Point Map Estimation7 Scenes--
69
Camera pose estimationScanNet++
AUC @ 30°98
17
Camera pose estimation7 Scenes
AUC@3°13.9
17
3D ReconstructionAverage of five benchmarks (DTU, ETH3D, nuScenes, ScanNet++, 7-Scenes)
IR80.3
9
Camera pose estimationnuScenes
AUC@343.4
9
Pointmap AccuracyETH3D
Relative L2 Error0.026
9
Pointmap AccuracyScanNet++
Rel. L20.015
9
Pointmap AccuracynuScenes
Relative L2 Error6.7
9
Pointmap AccuracyDTU
Relative L2 Error0.009
9
Showing 9 of 9 rows

Other info

Follow for update