Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DiSa: Saliency-Aware Foreground-Background Disentangled Framework for Open-Vocabulary Semantic Segmentation

About

Open-vocabulary semantic segmentation aims to assign labels to every pixel in an image based on text labels. Existing approaches typically utilize vision-language models (VLMs), such as CLIP, for dense prediction. However, VLMs, pre-trained on image-text pairs, are biased toward salient, object-centric regions and exhibit two critical limitations when adapted to segmentation: (i) Foreground Bias, which tends to ignore background regions, and (ii) Limited Spatial Localization, resulting in blurred object boundaries. To address these limitations, we introduce DiSa, a novel saliency-aware foreground-background disentangled framework. By explicitly incorporating saliency cues in our designed Saliency-aware Disentanglement Module (SDM), DiSa separately models foreground and background ensemble features in a divide-and-conquer manner. Additionally, we propose a Hierarchical Refinement Module (HRM) that leverages pixel-wise spatial contexts and enables channel-wise feature refinement through multi-level updates. Extensive experiments on six benchmarks demonstrate that DiSa consistently outperforms state-of-the-art methods.

Zhen Yao, Xin Li, Taotao Jing, Shuai Zhang, Mooi Choo Chuah• 2026

Related benchmarks

TaskDatasetResultRank
Semantic segmentationPASCAL-Context 59 class (val)
mIoU64.7
125
Semantic segmentationADE20K A-150 (val)
mIoU38.9
65
Semantic segmentationPASCAL Context P-459 (val)
mIoU24.9
60
Semantic segmentationADE20K 847 categories (val)
mIoU16.3
31
Semantic segmentationPASCAL VOC PAS-20 foreground categories (val)
mIoU98.7
21
Semantic segmentationPASCAL VOC PAS-20b 20 foreground categories + background (val)
mIoU84.7
9
Open Vocabulary Semantic SegmentationA-847, PC-459, A-150, PC-59, PAS-20, PAS-20b (test/val)
Params (M)456.2
6
Showing 7 of 7 rows

Other info

Follow for update