Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation
About
Using multiple spatial modalities has been proven helpful in improving semantic segmentation performance. However, there are several real-world challenges that have yet to be addressed: (a) improving label efficiency and (b) enhancing robustness in realistic scenarios where modalities are missing at the test time. To address these challenges, we first propose a simple yet efficient multi-modal fusion mechanism Linear Fusion, that performs better than the state-of-the-art multi-modal models even with limited supervision. Second, we propose M3L: Multi-modal Teacher for Masked Modality Learning, a semi-supervised framework that not only improves the multi-modal performance but also makes the model robust to the realistic missing modality scenario using unlabeled data. We create the first benchmark for semi-supervised multi-modal semantic segmentation and also report the robustness to missing modalities. Our proposal shows an absolute improvement of up to 10% on robust mIoU above the most competitive baselines. Our code is available at https://github.com/harshm121/M3L
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | DeLiVER | mIoU (Mean)47.64 | 30 | |
| Semantic segmentation | DFC 2023 | R Score94.14 | 20 | |
| Semantic segmentation | ISPRS (test) | R41.27 | 10 | |
| Semantic segmentation | ISPRS | mIoU (R)30.72 | 10 | |
| Semantic segmentation | Stanford Indoor (0.1% labeled (49 samples)) | mIoU40.05 | 8 | |
| Semantic segmentation | Stanford Indoor 0.2% labeled (98 samples) | mIoU44.62 | 8 | |
| Semantic segmentation | Stanford Indoor 1% labeled (491 samples) | mIoU49.28 | 8 | |
| Semantic segmentation | SUN RGBD 6.25% (297) labeled | mIoU (RGB)29.92 | 6 | |
| Semantic segmentation | SUN RGBD (12.5% (594) labeled) | mIoU (RGB)38.12 | 6 | |
| Semantic segmentation | SUN RGBD 25% (1189) labeled | mIoU (RGB)41.31 | 6 |