Excite, Attend and Segment (EASe): Domain-Agnostic Fine-Grained Mask Discovery with Feature Calibration and Self-Supervised Upsampling
About
Unsupervised segmentation approaches have increasingly leveraged foundation models (FM) to improve salient object discovery. However, these methods often falter in scenes with complex, multi-component morphologies, where fine-grained structural detail is indispensable. Many state-of-the-art unsupervised segmentation pipelines rely on mask discovery approaches that utilize coarse, patch-level representations. These coarse representations inherently suppress the fine-grained detail required to resolve such complex morphologies. To overcome this limitation, we propose Excite, Attend and Segment (EASe), an unsupervised domain-agnostic semantic segmentation framework for easy fine-grained mask discovery across challenging real-world scenes. EASe utilizes novel Semantic-Aware Upsampling with Channel Excitation (SAUCE) to excite low-resolution FM feature channels for selective calibration and attends across spatially-encoded image and FM features to recover full-resolution semantic representations. Finally, EASe segments the aggregated features into multi-granularity masks using a novel training-free Cue-Attentive Feature Aggregator (CAFE) which leverages SAUCE attention scores as a semantic grouping signal. EASe, together with SAUCE and CAFE, operate directly at pixel-level feature representations to enable accurate fine-grained dense semantic mask discovery. Our evaluation demonstrates superior performance of EASe over previous state-of-the-arts (SOTAs) across major standard benchmarks and diverse datasets with complex morphologies. Code is available at https://ease-project.github.io
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Unsupervised Semantic Segmentation | Cityscapes | mIoU32.8 | 25 | |
| Unsupervised Semantic Segmentation | Pascal VOC | mIoU0.636 | 9 | |
| Unsupervised Semantic Segmentation | COCO Object | mIoU43.2 | 6 | |
| Unsupervised Semantic Segmentation | COCO Stuff | mIoU50.9 | 6 | |
| Unsupervised Semantic Segmentation | ADE20K | mIoU49.4 | 6 | |
| Unsupervised Semantic Segmentation | PartImageNet | mIoU46.4 | 4 | |
| Unsupervised Semantic Segmentation | KITTI | mIoU36 | 4 | |
| Unsupervised Semantic Segmentation | OmniCrack30K | mIoU54.2 | 4 | |
| Unsupervised Semantic Segmentation | Roof Subassembly Damage Detection | mIoU58.4 | 4 |