ConInfer: Context-Aware Inference for Training-Free Open-Vocabulary Remote Sensing Segmentation

About

Training-free open-vocabulary remote sensing segmentation (OVRSS), empowered by vision-language models, has emerged as a promising paradigm for achieving category-agnostic semantic understanding in remote sensing imagery. Existing approaches mainly focus on enhancing feature representations or mitigating modality discrepancies to improve patch-level prediction accuracy. However, such independent prediction schemes are fundamentally misaligned with the intrinsic characteristics of remote sensing data. In real-world applications, remote sensing scenes are typically large-scale and exhibit strong spatial as well as semantic correlations, making isolated patch-wise predictions insufficient for accurate segmentation. To address this limitation, we propose ConInfer, a context-aware inference framework for OVRSS that performs joint prediction across multiple spatial units while explicitly modeling their inter-unit semantic dependencies. By incorporating global contextual cues, our method significantly enhances segmentation consistency, robustness, and generalization in complex remote sensing environments. Extensive experiments on multiple benchmark datasets demonstrate that our approach consistently surpasses state-of-the-art per-pixel VLM-based baselines such as SegEarth-OV, achieving average improvements of 2.80% and 6.13% on open-vocabulary semantic segmentation and object extraction tasks, respectively. The implementation code is available at: https://github.com/Dog-Yang/ConInfer

Wenyang Chen, Zhanxuan Hu, Yaping Zhang, Hailong Ning, Yonghang Tai• 2026

Related benchmarks

Task	Dataset	Result
Semantic segmentation	LoveDA	mIoU39.33	192
Semantic segmentation	Vaihingen	mIoU31.37	168
Semantic segmentation	iSAID	mIoU20.08	146
Semantic segmentation	Potsdam	mIoU49.99	110
Semantic segmentation	VDD	mIoU50.29	87
Semantic segmentation	UAVid	mIoU46.4	70
Road Extraction	Massachusetts	mIoU12.16	67
Semantic segmentation	UDD5	mIoU46.86	66
Building Extraction	INRIA	mIoU55.65	50
Building Extraction	xBD pre	IoU41.34	50

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord