MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching
About
Recent methods in stereo matching have continuously improved the accuracy using deep models. This gain, however, is attained with a high increase in computation cost, such that the network may not fit even on a moderate GPU. This issue raises problems when the model needs to be deployed on resource-limited devices. For this, we propose two light models for stereo vision with reduced complexity and without sacrificing accuracy. Depending on the dimension of cost volume, we design a 2D and a 3D model with encoder-decoders built from 2D and 3D convolutions, respectively. To this end, we leverage 2D MobileNet blocks and extend them to 3D for stereo vision application. Besides, a new cost volume is proposed to boost the accuracy of the 2D model, making it performing close to 3D networks. Experiments show that the proposed 2D/3D networks effectively reduce the computational expense (27%/95% and 72%/38% fewer parameters/operations in 2D and 3D models, respectively) while upholding the accuracy. Our code is available at https://github.com/cogsys-tuebingen/mobilestereonet.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Stereo Matching | KITTI 2015 (test) | D1 Error (Overall)2.1 | 144 | |
| Stereo Matching | KITTI 2015 | D1 Error (All)2.83 | 118 | |
| Stereo Matching | Scene Flow (test) | EPE0.8 | 70 | |
| Stereo Depth Estimation | Middlebury 2014 (train) | AbsRel0.139 | 8 | |
| Stereo Matching | KITTI 2015 (test) | Memory (MB)7.99 | 7 | |
| Stereo Matching | KITTI 2015 (val) | EPE (px)0.66 | 7 | |
| Stereo Depth Estimation | Middlebury 2014 | AbsRel13.7 | 6 | |
| Stereo Depth Estimation | DTU Robot Image Dataset Unrectified | AbsRel0.147 | 6 | |
| Stereo Depth Estimation | SceneFlow | AbsRel0.129 | 6 | |
| Stereo Depth Estimation | Aria Digital Twin (ADT) | AbsRel0.135 | 6 |