Direct Segmentation without Logits Optimization for Training-Free Open-Vocabulary Semantic Segmentation

About

Open-vocabulary semantic segmentation (OVSS) aims to segment arbitrary category regions in images using open-vocabulary prompts, necessitating that existing methods possess pixel-level vision-language alignment capability. Typically, this capability involves computing the cosine similarity, \ie, logits, between visual and linguistic features, and minimizing the distribution discrepancy between the logits and the ground truth (GT) to generate optimal logits that are subsequently used to construct segmentation maps, yet it depends on time-consuming iterative training or model-specific attention modulation. In this work, we propose a more direct approach that eschews the logits-optimization process by directly deriving an analytic solution for the segmentation map. We posit a key hypothesis: the distribution discrepancy encodes semantic information; specifically, this discrepancy exhibits consistency across patches belonging to the same category but inconsistency across different categories. Based on this hypothesis, we directly utilize the analytic solution of this distribution discrepancy as the semantic maps. In other words, we reformulate the optimization of the distribution discrepancy as deriving its analytic solution, thereby eliminating time-consuming iterative training, freeing us from model-specific attention modulation, and achieving state-of-the-art performance on eight benchmark datasets.

Jiahao Li, Yang Lu, Yachao Zhang, Fangyong Wang, Yuan Xie, Yanyun Qu• 2026

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K	mIoU23.4	699
Semantic segmentation	Cityscapes	mIoU43.9	526
Semantic segmentation	COCO Stuff	mIoU29.2	421
Semantic segmentation	PC-59	mIoU45.3	174
Semantic segmentation	COCO Object	mIoU42.9	147
Semantic segmentation	Pascal Context 60	mIoU38.7	139
Semantic segmentation	VOC-20	mIoU90.1	121
Semantic segmentation	VOC21	mIoU68.9	108

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord