Diffusion Models for Open-Vocabulary Segmentation
About
Open-vocabulary segmentation is the task of segmenting anything that can be named in an image. Recently, large-scale vision-language modelling has led to significant advances in open-vocabulary segmentation, but at the cost of gargantuan and increasing training and annotation efforts. Hence, we ask if it is possible to use existing foundation models to synthesise on-demand efficient segmentation algorithms for specific class sets, making them applicable in an open-vocabulary setting without the need to collect further data, annotations or perform training. To that end, we present OVDiff, a novel method that leverages generative text-to-image diffusion models for unsupervised open-vocabulary segmentation. OVDiff synthesises support image sets for arbitrary textual categories, creating for each a set of prototypes representative of both the category and its surrounding context (background). It relies solely on pre-trained components and outputs the synthesised segmenter directly, without training. Our approach shows strong performance on a range of benchmarks, obtaining a lead of more than 5% over prior work on PASCAL VOC.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | VOC21 | mIoU66.3 | 65 | |
| Open Vocabulary Semantic Segmentation | Pascal VOC 20 | mIoU81.7 | 62 | |
| Semantic segmentation | PC-59 | mIoU32.9 | 38 | |
| Semantic segmentation | ADE | mIoU14.1 | 32 | |
| Open Vocabulary Semantic Segmentation | PASCAL Context Context60 with background | mIoU29.7 | 28 | |
| Open Vocabulary Semantic Segmentation | ADE20K without background | mIoU14.1 | 28 | |
| Open Vocabulary Semantic Segmentation | COCO Object with background | mIoU34.6 | 27 | |
| Open Vocabulary Semantic Segmentation | COCO Stuff without background | mIoU20.3 | 27 | |
| Open Vocabulary Semantic Segmentation | Cityscapes without background | mIoU23.4 | 26 | |
| Open Vocabulary Semantic Segmentation | PASCAL VOC VOC20 without background 2012 | mIoU80.9 | 24 |