Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks
About
This work investigates the use of deep fully convolutional neural networks (DFCNN) for pixel-wise scene labeling of Earth Observation images. Especially, we train a variant of the SegNet architecture on remote sensing data over an urban area and study different strategies for performing accurate semantic segmentation. Our contributions are the following: 1) we transfer efficiently a DFCNN from generic everyday images to remote sensing images; 2) we introduce a multi-kernel convolutional layer for fast aggregation of predictions at multiple scales; 3) we perform data fusion from heterogeneous sensors (optical and laser) using residual correction. Our framework improves state-of-the-art accuracy on the ISPRS Vaihingen 2D Semantic Labeling dataset.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Map extraction | Porto regions (test) | IoU68.8 | 18 | |
| Map extraction | Shanghai regions (test) | IoU60.3 | 18 | |
| Map extraction | Singapore regions (test) | IoU56.5 | 18 |