INSID3: Training-Free In-Context Segmentation with DINOv3

About

In-context segmentation (ICS) aims to segment arbitrary concepts, e.g., objects, parts, or personalized instances, given one annotated visual examples. Existing work relies on (i) fine-tuning vision foundation models (VFMs), which improves in-domain results but harms generalization, or (ii) combines multiple frozen VFMs, which preserves generalization but yields architectural complexity and fixed segmentation granularities. We revisit ICS from a minimalist perspective and ask: Can a single self-supervised backbone support both semantic matching and segmentation, without any supervision or auxiliary models? We show that scaled-up dense self-supervised features from DINOv3 exhibit strong spatial structure and semantic correspondence. We introduce INSID3, a training-free approach that segments concepts at varying granularities only from frozen DINOv3 features, given an in-context example. INSID3 achieves state-of-the-art results across one-shot semantic, part, and personalized segmentation, outperforming previous work by +7.5 % mIoU, while using 3x fewer parameters and without any mask or category-level supervision. Code is available at https://github.com/visinf/INSID3 .

Claudia Cuttano, Gabriele Trivigno, Christoph Reich, Daniel Cremers, Carlo Masone, Stefan Roth• 2026

Related benchmarks

Task	Dataset	Result
Semantic segmentation	iSAID	mIoU56.9	146
Semantic segmentation	COCO-20i	mIoU (Mean)57.6	144
Semantic segmentation	LVIS 92^i	mIoU47.2	38
Semantic segmentation	ISIC	mIoU63.9	35
Semantic segmentation	SUIM	mIoU61.7	34
Semantic segmentation	Chest X-ray	mIoU78.8	25
Semantic segmentation	COCO-20^i	mIoU65.1	24
Part Segmentation	PASCAL-Part	mIoU50.5	22
Semantic segmentation	iSAID 5i	mIoU52.1	21
Part Segmentation	PACO-Part	mIoU38.7	17

Showing 10 of 15 rows

Other info

GitHub

Follow for update

@wizwand_team Discord