LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation

About

We propose a training-free method for open-vocabulary semantic segmentation using Vision-and-Language Models (VLMs). Our approach enhances the initial per-patch predictions of VLMs through label propagation, which jointly optimizes predictions by incorporating patch-to-patch relationships. Since VLMs are primarily optimized for cross-modal alignment and not for intra-modal similarity, we use a Vision Model (VM) that is observed to better capture these relationships. We address resolution limitations inherent to patch-based encoders by applying label propagation at the pixel level as a refinement step, significantly improving segmentation accuracy near class boundaries. Our method, called LPOSS+, performs inference over the entire image, avoiding window-based processing and thereby capturing contextual interactions across the full image. LPOSS+ achieves state-of-the-art performance among training-free methods, across a diverse set of datasets. Code: https://github.com/vladan-stojnic/LPOSS

Vladan Stojni\'c, Yannis Kalantidis, Ji\v{r}\'i Matas, Giorgos Tolias• 2025

Related benchmarks

Task	Dataset	Result
Open Vocabulary Semantic Segmentation	Pascal VOC 20	mIoU89.6	113
Open Vocabulary Semantic Segmentation	Pascal Context PC-59	mIoU35.2	99
Open Vocabulary Semantic Segmentation	COCO Stuff without background	mIoU42.1	90
Open Vocabulary Semantic Segmentation	COCO Object with background	mIoU42.1	87
Open Vocabulary Semantic Segmentation	Cityscapes	mIoU37.9	81
Open Vocabulary Semantic Segmentation	ADE20K	mIoU22.3	80
Open Vocabulary Semantic Segmentation	ADE20K without background	mIoU21.8	72
Open Vocabulary Semantic Segmentation	PASCAL Context Context60 with background	mIoU34.6	69
Open Vocabulary Semantic Segmentation	Cityscapes without background	mIoU37.3	67
Open Vocabulary Semantic Segmentation	PASCAL Context 59 without background	mIoU37.8	67

Showing 10 of 26 rows

Other info

Follow for update

@wizwand_team Discord