Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Attention meets Geometry: Geometry Guided Spatial-Temporal Attention for Consistent Self-Supervised Monocular Depth Estimation

About

Inferring geometrically consistent dense 3D scenes across a tuple of temporally consecutive images remains challenging for self-supervised monocular depth prediction pipelines. This paper explores how the increasingly popular transformer architecture, together with novel regularized loss formulations, can improve depth consistency while preserving accuracy. We propose a spatial attention module that correlates coarse depth predictions to aggregate local geometric information. A novel temporal attention mechanism further processes the local geometric information in a global context across consecutive images. Additionally, we introduce geometric constraints between frames regularized by photometric cycle consistency. By combining our proposed regularization and the novel spatial-temporal-attention module we fully leverage both the geometric and appearance-based consistency across monocular frames. This yields geometrically meaningful attention and improves temporal depth stability and accuracy compared to previous methods.

Patrick Ruhkamp, Daoyi Gao, Hanzhi Chen, Nassir Navab, Benjamin Busam• 2021

Related benchmarks

TaskDatasetResultRank
Depth EstimationKITTI (Eigen split)
RMSE3.222
276
Monocular Depth EstimationKITTI (Eigen split)
Abs Rel0.071
193
Monocular Depth EstimationDDAD (test)
RMSE15.121
122
Monocular Depth EstimationKITTI Improved GT (Eigen)
AbsRel0.113
92
Depth EstimationDDAD (val)
Sq Rel3.788
31
Video Depth EstimationKITTI (Eigen split)
Delta1 Acc92.1
9
Video Depth EstimationKITTI
rTC0.901
9
Showing 7 of 7 rows

Other info

Follow for update