Bidirectional Attention Network for Monocular Depth Estimation
About
In this paper, we propose a Bidirectional Attention Network (BANet), an end-to-end framework for monocular depth estimation (MDE) that addresses the limitation of effectively integrating local and global information in convolutional neural networks. The structure of this mechanism derives from a strong conceptual foundation of neural machine translation, and presents a light-weight mechanism for adaptive control of computation similar to the dynamic nature of recurrent neural networks. We introduce bidirectional attention modules that utilize the feed-forward feature maps and incorporate the global context to filter out ambiguity. Extensive experiments reveal the high degree of capability of this bidirectional attention model over feed-forward baselines and other state-of-the-art methods for monocular depth estimation on two challenging datasets -- KITTI and DIODE. We show that our proposed approach either outperforms or performs at least on a par with the state-of-the-art monocular depth estimation methods with less memory and computational complexity.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Monocular Depth Estimation | KITTI (test) | Abs Rel Error9.34 | 103 | |
| Monocular Depth Estimation | KITTI official (val) | RMSE3.3 | 23 | |
| Depth Estimation | KITTI public benchmark official (test) | SILog11.55 | 22 | |
| Monocular Depth Estimation | KITTI online server (test) | SILog11.61 | 15 | |
| Depth Estimation | KITTI (official split) | Absolute Relative Error2.29 | 10 | |
| Monocular Depth Estimation | KITTI (official) | SILog11.61 | 9 | |
| Monocular Depth Estimation | KITTI 2012 (test) | SILog11.55 | 8 |