Fisheye3R: Adapting Unified 3D Feed-Forward Foundation Models to Fisheye Lenses
About
Feed-forward foundation models for multi-view 3-dimensional (3D) reconstruction have been trained on large-scale datasets of perspective images; when tested on wide field-of-view images, e.g., from a fisheye camera, their performance degrades. Their error arises from changes in spatial positions of pixels due to a non-linear projection model that maps 3D points onto the 2D image plane. While one may surmise that training on fisheye images would resolve this problem, there are far fewer fisheye images with ground truth than perspective images, which limit generalization. To enable inference on imagery exhibiting high radial distortion, we propose Fisheye3R, a novel adaptation framework that extends these multi-view 3D reconstruction foundation models to natively accommodate fisheye inputs without performance regression on perspective images. To address the scarcity of fisheye images and ground truth, we introduce flexible learning schemes that support self-supervised adaptation using only unlabeled perspective images and supervised adaptation without any fisheye training data. Extensive experiments across three foundation models, including VGGT, $\pi^3$, and MapAnything, demonstrate that our approach consistently improves camera pose, depth, point map, and field-of-view estimation on fisheye images.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Depth Estimation | ScanNet++ | AbsRel0.171 | 40 | |
| Pose Estimation | KITTI-360 | RPE Translation (cm)23.9 | 29 | |
| Point Map Estimation | ScanNet++ | CD0.051 | 16 | |
| Depth Map Estimation | ADT | Relative Error (Rel)7.3 | 12 | |
| Depth Map Estimation | KITTI360 | Relative Error (Rel)9.1 | 12 | |
| FoV Map Estimation | ScanNet++ | hErr2.941 | 12 | |
| FoV Map Estimation | ADT | Horizontal Error (hErr)1.161 | 12 | |
| FoV Map Estimation | KITTI360 | Horizontal Error2.963 | 12 | |
| Pose Estimation | ADT | RRA1 | 12 | |
| Point Map Estimation | ADT | Accuracy14 | 12 |