Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Wid3R: Wide Field-of-View 3D Reconstruction via Camera Model Conditioning

About

We present Wid3R, a feed-forward neural network for visual geometry reconstruction that supports wide field-of-view camera models. Prior methods typically assume that input images are rectified or captured with pinhole cameras, since both their architectures and training datasets are tailored to perspective images only. These assumptions limit their applicability in real-world scenarios that use fisheye or panoramic cameras and often require careful calibration and undistortion. In contrast, Wid3R is a generalizable multi-view 3D estimation method that can model wide field-of-view camera types. Our approach leverages a ray representation with spherical harmonics and a novel camera model token within the network, enabling distortion-aware 3D reconstruction. Furthermore, Wid3R is the first multi-view foundation model to support feed-forward 3D reconstruction directly from 360 imagery. It demonstrates strong zero-shot robustness and consistently outperforms prior methods, achieving improvements of up to +77.33 on Stanford2D3D.

Dongki Jung, Jaehoon Choi, Adil Qureshi, Somi Jeong, Dinesh Manocha, Suyong Yeon• 2026

Related benchmarks

TaskDatasetResultRank
Monocular 360 Depth EstimationMatterport3D official (test)
Delta Acc (1.25x)94.8
20
Large-scale LocalizationMatterport3D 2t7WUuJeko7
Registration Count37
6
Large-scale LocalizationMatterport3D 8194nk5LbLH
Registration Count Success Rate100
6
Large-scale LocalizationMatterport3D pLe4wQe7qrG
Registered Count31
6
Camera pose estimationZip-NeRF (test)
ATE0.49
3
Camera pose estimationFIORD Kitchen_In, meetingroom, and parakennus scenes
ATE0.44
3
Camera pose estimationFIORD
RRA@30100
3
Camera pose estimationStanford2D3D (area_5a and area_5b)
RRA@3094.05
3
Point Map EstimationScanNet++
Accuracy Mean0.018
3
Point Map EstimationMatterport3D
Mean Accuracy9.4
3
Showing 10 of 11 rows

Other info

Follow for update