CalibAnyView: Beyond Single-View Camera Calibration in the Wild
About
Camera calibration is a fundamental prerequisite for reliable geometric perception, yet classical approaches rely on controlled acquisition setups that are impractical for in-the-wild imagery. Recent learning-based methods have shown promising results for single-view calibration, but inherently neglect geometric consistency across multiple views. We introduce CalibAnyView, a unified formulation that supports an arbitrary number of input views ($N \geq 1$) by explicitly modeling cross-view geometric consistency. To facilitate this, we construct a large-scale multi-view video dataset covering diverse real-world scenarios, including multiple camera models, dynamic scenes, realistic motion trajectories, and heterogeneous lens distortions. Building on this dataset, we develop a multi-view transformer that predicts dense perspective fields, which are further integrated into a geometric optimization framework to jointly estimate camera intrinsics and gravity direction. Extensive experiments demonstrate that CalibAnyView consistently outperforms state-of-the-art methods, achieves strong robustness under single-view settings, and further improves with multi-view inference, providing a reliable foundation for downstream tasks such as 3D reconstruction and robotic perception in the wild.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Camera Understanding | MegaDepth | FoV AUC@1°14.8 | 31 | |
| Camera Understanding | Stanford2D3D | FoV AUC (Threshold 1°)27.1 | 26 | |
| Camera Understanding | TartanAir | FoV AUC@1°21.9 | 26 | |
| Camera Understanding | LaMAR | FoV AUC@1°24.6 | 26 | |
| Camera Calibration | Proposed Dataset (test) | Field of View (FoV)4.54 | 11 | |
| Multi-view camera calibration | Stanford2D3D 157 windows | vFoV Error [°]3.37 | 7 | |
| Multi-view camera calibration | TartanAir 205 windows | Vertical Field of View Error3.12 | 7 |