Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation

About

Recently, deep learning based methods have revolutionized remote sensing image segmentation. However, these methods usually rely on a pre-defined semantic class set, thus needing additional image annotation and model training when adapting to new classes. More importantly, they are unable to segment arbitrary semantic classes. In this work, we introduce Open-Vocabulary Remote Sensing Image Semantic Segmentation (OVRSISS), which aims to segment arbitrary semantic classes in remote sensing images. To address the lack of OVRSISS datasets, we develop LandDiscover50K, a comprehensive dataset of 51,846 images covering 40 diverse semantic classes. In addition, we propose a novel framework named GSNet that integrates domain priors from special remote sensing models and versatile capabilities of general vision-language models. Technically, GSNet consists of a Dual-Stream Image Encoder (DSIE), a Query-Guided Feature Fusion (QGFF), and a Residual Information Preservation Decoder (RIPD). DSIE first captures comprehensive features from both special models and general models in dual streams. Then, with the guidance of variable vocabularies, QGFF integrates specialist and generalist features, enabling them to complement each other. Finally, RIPD is proposed to aggregate multi-source features for more accurate mask predictions. Experiments show that our method outperforms other methods by a large margin, and our proposed LandDiscover50K improves the performance of OVRSISS methods. The proposed dataset and method will be made publicly available at https://github.com/yecy749/GSNet.

Chengyang Ye, Yunzhi Zhuge, Pingping Zhang• 2024

Related benchmarks

Task	Dataset	Result
Semantic segmentation	LoveDA	mIoU78.2	192
Semantic segmentation	Vaihingen	mIoU44.13	168
Semantic segmentation	iSAID	mIoU93.11	146
Semantic segmentation	Potsdam	mIoU43.2	110
Semantic segmentation	LoveDA	mIoU32.52	97
Semantic segmentation	VDD	mIoU38.1	87
Semantic segmentation	UAVid	mIoU25.42	70
Road Extraction	Massachusetts	mIoU50.88	67
Semantic segmentation	UDD5	mIoU40.92	66
Building Extraction	xBD pre	IoU76.23	50

Showing 10 of 34 rows

Other info

Follow for update

@wizwand_team Discord