Vision Harnessing Agent for Open Ad-hoc Segmentation

About

Segmentation has become easy when the concept is known, requiring retrieval of a learned visual grounding from text. It remains hard for open ad-hoc concepts, where the grounding may not exist as one learned mask and must often be constructed from image evidence through parts, relations, exclusions, and collections. We propose a Vision-guided Ad-hoc Segmentation Agent (VASA), the first vision harnessing agent for open ad-hoc segmentation. VASA is training-free and couples a VLM agent, a segmentation foundation model, and a visually grounded workflow. Rather than revising text prompts alone, VASA uses a persistent working mask to reason, construct, and validate a solution. It plans visual operations, invokes segmentation tools, inspects results, edits the mask, and recovers from errors. We construct PARS, a new benchmark that turns part-level labels in PartImageNet into open ad-hoc concepts through long-form definition queries. On PARS, VASA outperforms open-vocabulary, reasoning-based, and agentic baselines, surpassing SAM3 Agent by 14-25%. On RefCOCOm, a standard multi-granularity referring segmentation benchmark, VASA improves over SAM3 Agent by 5-9% and over other agentic baselines by up to 20%. These results validate agentic visual construction for open ad-hoc segmentation. Our work points to a path for AI agents beyond wrapping foundation models as tools: Programming them with task knowledge, VLM behavior, visual routines, working memory, and failure-aware workflows.

Zilin Wang, Stella X. Yu• 2026

Related benchmarks

Task	Dataset	Result
Open-Vocabulary Part Segmentation	PARS Ad-hoc Concepts	gIoU56.9	28
Multi-granularity Referring Expression Segmentation	RefCOCOm (val)	gIoU (Part)45	14
Multi-granularity Referring Expression Segmentation	RefCOCOm (testA)	gIoU (Part)43.2	14
Multi-granularity Referring Expression Segmentation	RefCOCOm (testB)	gIoU (Part)47.6	14
Open-Vocabulary Part Segmentation	PARS Common Concepts	gIoU60.8	14

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord