Decoder Modulation for Indoor Depth Completion
About
Depth completion recovers a dense depth map from sensor measurements. Current methods are mostly tailored for very sparse depth measurements from LiDARs in outdoor settings, while for indoor scenes Time-of-Flight (ToF) or structured light sensors are mostly used. These sensors provide semi-dense maps, with dense measurements in some regions and almost empty in others. We propose a new model that takes into account the statistical difference between such regions. Our main contribution is a new decoder modulation branch added to the encoder-decoder architecture. The encoder extracts features from the concatenated RGB image and raw depth. Given the mask of missing values as input, the proposed modulation branch controls the decoding of a dense depth map from these features differently for different regions. This is implemented by modifying the spatial distribution of output signals inside the decoder via Spatially-Adaptive Denormalization (SPADE) blocks. Our second contribution is a novel training strategy that allows us to train on a semi-dense sensor data when the ground truth depth map is not available. Our model achieves the state of the art results on indoor Matterport3D dataset. Being designed for semi-dense input depth, our model is still competitive with LiDAR-oriented approaches on the KITTI dataset. Our training strategy significantly improves prediction quality with no dense ground truth available, as validated on the NYUv2 dataset.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Depth Completion | NYU-depth-v2 official (test) | RMSE0.205 | 187 | |
| Depth Completion | Matterport3D (test) | RMSE0.961 | 16 | |
| Depth Completion | ScanNet (test) | MAE0.054 | 10 | |
| Indoor Depth Completion | NYU-Depth Setting A: R ⇒ T V2 (test) | RMSE0.205 | 9 |