Wid3R: Wide Field-of-View 3D Reconstruction via Camera Model Conditioning
About
We present Wid3R, a feed-forward neural network for multi-view visual geometry reconstruction that supports wide field-of-view camera models. Unlike existing methods that assume rectified or pinhole inputs, Wid3R directly models wide-angle imagery without explicit calibration or undistortion. Our approach leverages a ray-based representation with spherical harmonics and introduces a novel camera model token to enable distortion-aware reconstruction. To the best of our knowledge, Wid3R is the first multi-frame feed-forward 3D reconstruction method that supports 360 imagery. Moreover, we show that conditioning on diverse camera types improves generalization to 360 scenes and alleviates data sparsity issues. Wid3R achieves significant performance gains, improving AUC@30 by up to +33.67 on Zip-NeRF (fisheye) and +77.33 on Stanford2D3D (360).
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Monocular 360 Depth Estimation | Matterport3D official (test) | Delta Acc (1.25x)94.8 | 20 | |
| Point Map Estimation | ScanNet++ | -- | 16 | |
| Large-scale Localization | Matterport3D 2t7WUuJeko7 | Registration Count37 | 6 | |
| Large-scale Localization | Matterport3D 8194nk5LbLH | Registration Count Success Rate100 | 6 | |
| Large-scale Localization | Matterport3D pLe4wQe7qrG | Registered Count31 | 6 | |
| Camera pose estimation | Zip-NeRF (test) | ATE0.49 | 3 | |
| Camera pose estimation | FIORD Kitchen_In, meetingroom, and parakennus scenes | ATE0.44 | 3 | |
| Camera pose estimation | FIORD | RRA@30100 | 3 | |
| Camera pose estimation | Stanford2D3D (area_5a and area_5b) | RRA@3094.05 | 3 | |
| Point Map Estimation | Matterport3D | Mean Accuracy9.4 | 3 |