PEARL: Geometry Aligns Semantics for Training-Free Open-Vocabulary Semantic Segmentation
About
Training-free open-vocabulary semantic segmentation (OVSS) promises rapid adaptation to new label sets without retraining. Yet, many methods rely on heavy post-processing or handle text and vision in isolation, leaving cross-modal geometry underutilized. Others introduce auxiliary vision backbones or multi-model pipelines, which increase complexity and latency while compromising design simplicity. We present PEARL, \textbf{\underline{P}}rocrust\textbf{\underline{e}}s \textbf{\underline{a}}lignment with text-awa\textbf{\underline{r}}e \textbf{\underline{L}}aplacian propagation, a compact two-step inference that follows an align-then-propagate principle. The Procrustes alignment step performs an orthogonal projection inside the last self-attention block, rotating keys toward the query subspace via a stable polar iteration. The text-aware Laplacian propagation then refines per-pixel logits on a small grid through a confidence-weighted, text-guided graph solve: text provides both a data-trust signal and neighbor gating, while image gradients preserve boundaries. In this work, our method is fully training-free, plug-and-play, and uses only fixed constants, adding minimal latency with a small per-head projection and a few conjugate-gradient steps. Our approach, PEARL, sets a new state-of-the-art in training-free OVSS without extra data or auxiliary backbones across standard benchmarks, achieving superior performance under both with-background and without-background protocols.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Open Vocabulary Semantic Segmentation | Pascal VOC 20 | mIoU86.9 | 104 | |
| Open Vocabulary Semantic Segmentation | Pascal Context PC-59 | mIoU38.6 | 89 | |
| Open Vocabulary Semantic Segmentation | ADE20K without background | mIoU19.4 | 72 | |
| Open Vocabulary Semantic Segmentation | COCO Stuff without background | mIoU26.3 | 71 | |
| Open Vocabulary Semantic Segmentation | PASCAL Context Context60 with background | mIoU35.1 | 69 | |
| Open Vocabulary Semantic Segmentation | COCO Object with background | mIoU37.3 | 68 | |
| Open Vocabulary Semantic Segmentation | Cityscapes without background | mIoU37.6 | 67 | |
| Open Vocabulary Semantic Segmentation | PASCAL Context 59 without background | mIoU38.6 | 67 | |
| Open Vocabulary Semantic Segmentation | Cityscapes | mIoU37.6 | 43 | |
| Open Vocabulary Semantic Segmentation | ADE20K | mIoU19.4 | 42 |