Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation

About

We propose a training-free method for open-vocabulary semantic segmentation using Vision-and-Language Models (VLMs). Our approach enhances the initial per-patch predictions of VLMs through label propagation, which jointly optimizes predictions by incorporating patch-to-patch relationships. Since VLMs are primarily optimized for cross-modal alignment and not for intra-modal similarity, we use a Vision Model (VM) that is observed to better capture these relationships. We address resolution limitations inherent to patch-based encoders by applying label propagation at the pixel level as a refinement step, significantly improving segmentation accuracy near class boundaries. Our method, called LPOSS+, performs inference over the entire image, avoiding window-based processing and thereby capturing contextual interactions across the full image. LPOSS+ achieves state-of-the-art performance among training-free methods, across a diverse set of datasets. Code: https://github.com/vladan-stojnic/LPOSS

Vladan Stojni\'c, Yannis Kalantidis, Ji\v{r}\'i Matas, Giorgos Tolias• 2025

Related benchmarks

TaskDatasetResultRank
Open Vocabulary Semantic SegmentationCOCOStuff (val)
mIoU25.9
60
Open Vocabulary Semantic SegmentationCityscapes (val)
mIoU37.3
37
Open Vocabulary Semantic SegmentationPASCAL Context 59 (val)
mIoU37.8
32
Open-Vocabulary SegmentationPascal VOC 21 2012 (val)
mIoU61.1
27
Open-Vocabulary SegmentationPascal Context 60 (val)
mIoU34.6
26
Open-Vocabulary SegmentationADE20K (ADE) (val)
mIoU21.8
25
Open-Vocabulary SegmentationCOCO-Object (COCO-O) (val)
mIoU33.4
25
Open-Vocabulary SegmentationPascal VOC 20 2012 (val)
mIoU78.8
23
Open-Vocabulary SegmentationNatural-scene (NS) benchmark suite V21, PC60, COCO-O, V20, PC59, COCO-S, City, ADE
V21 mIoU (with background)61.1
18
Showing 9 of 9 rows

Other info

Follow for update