Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation

About

Self-supervised monocular depth estimation that does not require ground truth for training has attracted attention in recent years. It is of high interest to design lightweight but effective models so that they can be deployed on edge devices. Many existing architectures benefit from using heavier backbones at the expense of model sizes. This paper achieves comparable results with a lightweight architecture. Specifically, the efficient combination of CNNs and Transformers is investigated, and a hybrid architecture called Lite-Mono is presented. A Consecutive Dilated Convolutions (CDC) module and a Local-Global Features Interaction (LGFI) module are proposed. The former is used to extract rich multi-scale local features, and the latter takes advantage of the self-attention mechanism to encode long-range global information into the features. Experiments demonstrate that Lite-Mono outperforms Monodepth2 by a large margin in accuracy, with about 80% fewer trainable parameters.

Ning Zhang, Francesco Nex, George Vosselman, Norman Kerle• 2022

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationKITTI (Eigen)
Abs Rel0.097
502
Monocular Depth EstimationKITTI (Eigen split)
Abs Rel0.101
193
Monocular Depth EstimationMake3D (test)
Abs Rel0.305
132
Monocular Depth EstimationKITTI Improved GT (Eigen)
AbsRel0.102
92
Depth EstimationKITTI improved dense ground truth
Abs Rel0.077
29
Monocular Depth EstimationKITTI Raw (Eigen)
Abs Rel9.7
23
Depth EstimationnuScenes day-clear
RMSE6.818
22
Monocular Depth EstimationDDAD
Abs Rel Error0.161
17
Monocular Depth EstimationC3VD (test)
Abs Rel0.146
16
Monocular Depth EstimationWaymo Open Dataset
AbsRel0.154
13
Showing 10 of 18 rows

Other info

Code

Follow for update