CTRL-C: Camera calibration TRansformer with Line-Classification
About
Single image camera calibration is the task of estimating the camera parameters from a single input image, such as the vanishing points, focal length, and horizon line. In this work, we propose Camera calibration TRansformer with Line-Classification (CTRL-C), an end-to-end neural network-based approach to single image camera calibration, which directly estimates the camera parameters from an image and a set of line segments. Our network adopts the transformer architecture to capture the global structure of an image with multi-modal inputs in an end-to-end manner. We also propose an auxiliary task of line classification to train the network to extract the global geometric information from lines effectively. Our experiments demonstrate that CTRL-C outperforms the previous state-of-the-art methods on the Google Street View and SUN360 benchmark datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Perspective Field prediction | Stanford2D3D (test) | Up Mean7.39 | 12 | |
| Perspective Field prediction | TartanAir (test) | Mean Angular Error (Up)7.32 | 12 | |
| Camera Parameter Estimation | GSV uncentered (test) | Roll Mean Error1.92 | 6 | |
| Monocular Camera Calibration | GSV dataset (test) | Mean FoV Error (°)3.59 | 6 | |
| Object-centric prediction | Objectron 1.0 (isolated) | Up Mean Error7.49 | 5 | |
| Object-centric prediction | Objectron 1.0 (crop) | Up Mean Error7.5 | 5 | |
| Camera Parameter Estimation | GSV centered principal-point | Roll Error Mean (°)0.66 | 4 |