HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features
About
We present HoHoNet, a versatile and efficient framework for holistic understanding of an indoor 360-degree panorama using a Latent Horizontal Feature (LHFeat). The compact LHFeat flattens the features along the vertical direction and has shown success in modeling per-column modality for room layout reconstruction. HoHoNet advances in two important aspects. First, the deep architecture is redesigned to run faster with improved accuracy. Second, we propose a novel horizon-to-dense module, which relaxes the per-column output shape constraint, allowing per-pixel dense prediction from LHFeat. HoHoNet is fast: It runs at 52 FPS and 110 FPS with ResNet-50 and ResNet-34 backbones respectively, for modeling dense modalities from a high-resolution $512 \times 1024$ panorama. HoHoNet is also accurate. On the tasks of layout estimation and semantic segmentation, HoHoNet achieves results on par with current state-of-the-art. On dense depth estimation, HoHoNet outperforms all the prior arts by a large margin.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | Stanford2D3DS (3-fold cross-validation) | mIoU56.73 | 90 | |
| Monocular Depth Estimation | Stanford2D3D (test) | δ1 Accuracy90.54 | 71 | |
| Monocular Depth Estimation | Matterport3D (test) | Delta Acc (< 1.25)87.86 | 48 | |
| Semantic segmentation | Stanford2D3D Panoramic 1.0 (Fold-1) | mIoU52 | 43 | |
| Semantic segmentation | Stanford2D3D-Panoramic (SPan) v1 (averaged by 3 folds) | mIoU52 | 39 | |
| Depth Estimation | Matterport3D | delta194.15 | 35 | |
| Semantic segmentation | Stanford2D3D | mIoU43.3 | 32 | |
| Room Layout Estimation | MatterportLayout (test) | 2D IoU82.71 | 28 | |
| Semantic segmentation | Structured3D (test) | mIoU66.99 | 21 | |
| Monocular 360 Depth Estimation | Matterport3D official (test) | Delta Acc (1.25x)87.86 | 20 |