A Simple Image Segmentation Framework via In-Context Examples
About
Recently, there have been explorations of generalist segmentation models that can effectively tackle a variety of image segmentation tasks within a unified in-context learning framework. However, these methods still struggle with task ambiguity in in-context segmentation, as not all in-context examples can accurately convey the task information. In order to address this issue, we present SINE, a simple image Segmentation framework utilizing in-context examples. Our approach leverages a Transformer encoder-decoder structure, where the encoder provides high-quality image representations, and the decoder is designed to yield multiple task-specific output masks to effectively eliminate task ambiguity. Specifically, we introduce an In-context Interaction module to complement in-context information and produce correlations between the target image and the in-context example and a Matching Transformer that uses fixed matching and a Hungarian algorithm to eliminate differences between different tasks. In addition, we have further perfected the current evaluation system for in-context image segmentation, aiming to facilitate a holistic appraisal of these models. Experiments on various segmentation tasks show the effectiveness of the proposed method.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | COCO-20i | mIoU (Mean)64.5 | 144 | |
| Semantic segmentation | iSAID | mIoU40.5 | 122 | |
| Semantic segmentation | LVIS 92^i | mIoU35.5 | 38 | |
| Semantic segmentation | ISIC | mIoU28.6 | 35 | |
| Semantic segmentation | SUIM | mIoU54.8 | 34 | |
| Semantic segmentation | Chest X-ray | mIoU39.8 | 25 | |
| Semantic segmentation | COCO-20^i | mIoU66.1 | 24 | |
| Part Segmentation | PASCAL-Part | mIoU36.2 | 22 | |
| Semantic segmentation | iSAID 5i | mIoU38.3 | 21 | |
| Part Segmentation | PACO-Part | mIoU23.3 | 17 |