INSID3: Training-Free In-Context Segmentation with DINOv3
About
In-context segmentation (ICS) aims to segment arbitrary concepts, e.g., objects, parts, or personalized instances, given one annotated visual examples. Existing work relies on (i) fine-tuning vision foundation models (VFMs), which improves in-domain results but harms generalization, or (ii) combines multiple frozen VFMs, which preserves generalization but yields architectural complexity and fixed segmentation granularities. We revisit ICS from a minimalist perspective and ask: Can a single self-supervised backbone support both semantic matching and segmentation, without any supervision or auxiliary models? We show that scaled-up dense self-supervised features from DINOv3 exhibit strong spatial structure and semantic correspondence. We introduce INSID3, a training-free approach that segments concepts at varying granularities only from frozen DINOv3 features, given an in-context example. INSID3 achieves state-of-the-art results across one-shot semantic, part, and personalized segmentation, outperforming previous work by +7.5 % mIoU, while using 3x fewer parameters and without any mask or category-level supervision. Code is available at https://github.com/visinf/INSID3 .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | COCO-20i | mIoU (Mean)57.6 | 144 | |
| Semantic segmentation | iSAID | mIoU56.9 | 122 | |
| Semantic segmentation | LVIS 92^i | mIoU47.2 | 38 | |
| Semantic segmentation | ISIC | mIoU63.9 | 35 | |
| Semantic segmentation | SUIM | mIoU61.7 | 34 | |
| Semantic segmentation | Chest X-ray | mIoU78.8 | 25 | |
| Semantic segmentation | COCO-20^i | mIoU65.1 | 24 | |
| Part Segmentation | PASCAL-Part | mIoU50.5 | 22 | |
| Semantic segmentation | iSAID 5i | mIoU52.1 | 21 | |
| Part Segmentation | PACO-Part | mIoU38.7 | 17 |