Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation

About

Self-supervised monocular depth estimation that does not require ground truth for training has attracted attention in recent years. It is of high interest to design lightweight but effective models so that they can be deployed on edge devices. Many existing architectures benefit from using heavier backbones at the expense of model sizes. This paper achieves comparable results with a lightweight architecture. Specifically, the efficient combination of CNNs and Transformers is investigated, and a hybrid architecture called Lite-Mono is presented. A Consecutive Dilated Convolutions (CDC) module and a Local-Global Features Interaction (LGFI) module are proposed. The former is used to extract rich multi-scale local features, and the latter takes advantage of the self-attention mechanism to encode long-range global information into the features. Experiments demonstrate that Lite-Mono outperforms Monodepth2 by a large margin in accuracy, with about 80% fewer trainable parameters.

Ning Zhang, Francesco Nex, George Vosselman, Norman Kerle• 2022

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationKITTI (Eigen)
Abs Rel0.097
523
Monocular Depth EstimationKITTI (Eigen split)
Abs Rel0.101
215
Monocular Depth EstimationMake3D (test)
Abs Rel0.305
132
Monocular Depth EstimationKITTI Eigen split (test)
AbsRel Mean0.101
100
Monocular Depth EstimationKITTI Improved GT (Eigen)
AbsRel0.102
92
Monocular Depth EstimationSintel
Abs Rel0.383
91
Depth EstimationKITTI improved dense ground truth
Abs Rel0.077
29
Self-supervised Monocular Depth EstimationKITTI (Eigen)
Absolute Relative Error (Abs Rel)10.1
24
Monocular Depth EstimationKITTI Raw (Eigen)
Abs Rel9.7
23
Depth EstimationnuScenes day-clear
RMSE6.818
22
Showing 10 of 21 rows

Other info

Code

Follow for update