Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

About

In this paper, we introduce a method to tackle Domain Generalized Semantic Segmentation (DGSS) by utilizing domain-invariant semantic knowledge from text embeddings of vision-language models. We employ the text embeddings as object queries within a transformer-based segmentation framework (textual object queries). These queries are regarded as a domain-invariant basis for pixel grouping in DGSS. To leverage the power of textual object queries, we introduce a novel framework named the textual query-driven mask transformer (tqdm). Our tqdm aims to (1) generate textual object queries that maximally encode domain-invariant semantics and (2) enhance the semantic clarity of dense visual features. Additionally, we suggest three regularization losses to improve the efficacy of tqdm by aligning between visual and textual features. By utilizing our method, the model can comprehend inherent semantic information for classes of interest, enabling it to generalize to extreme domains (e.g., sketch style). Our tqdm achieves 68.9 mIoU on GTA5$\rightarrow$Cityscapes, outperforming the prior state-of-the-art method by 2.5 mIoU. The project page is available at https://byeonghyunpak.github.io/tqdm.

Byeonghyun Pak, Byeongju Woo, Sunghwan Kim, Dae-hwan Kim, Hoseong Kim• 2024

Related benchmarks

TaskDatasetResultRank
Semantic segmentationGTA5 → Cityscapes (val)
mIoU68.9
533
Semantic segmentationSYNTHIA to Cityscapes (val)--
435
Semantic segmentationGTA5 to {Cityscapes, Mapillary, BDD} (test)
mIoU (Cityscapes)68.88
94
Semantic segmentationCityScapes, BDD, and Mapillary (val)
Mean mIoU66.05
85
Semantic segmentationMapillary
mIoU76.15
75
Semantic segmentationGTA5 → {Cityscapes, BDD100K, Mapillary} (Target Domains)
Score (Cityscapes)68.88
36
Semantic segmentationMapillary Vistas
mIoU70.1
22
Semantic segmentationSYNTHIA to {Cityscapes, Mapillary, BDDS} (test)
mIoU (Cityscapes)57.99
21
Semantic segmentationCityscapes, BDD100K, and Mapillary Aggregate (test)
mIoU70.44
21
Semantic segmentationCityscapes → BDD100K & Mapillary
Average mIoU70.44
19
Showing 10 of 13 rows

Other info

Code

Follow for update