Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting

About

Reasoning Segmentation requires models to interpret complex, context-dependent linguistic queries to achieve pixel-level localization. Current dominant approaches rely heavily on Supervised Fine-Tuning (SFT) or Reinforcement Learning (RL). However, SFT suffers from catastrophic forgetting and domain dependency, while RL is often hindered by training instability and rigid reliance on predefined reward functions. Although recent training-free methods circumvent these training burdens, they are fundamentally limited by a static inference paradigm. These methods typically rely on a single-pass "generate-then-segment" chain, which suffers from insufficient reasoning depth and lacks the capability to self-correct linguistic hallucinations or spatial misinterpretations. In this paper, we challenge these limitations and propose EVOL-SAM3, a novel zero-shot framework that reformulates reasoning segmentation as an inference-time evolutionary search process. Instead of relying on a fixed prompt, EVOL-SAM3 maintains a population of prompt hypotheses and iteratively refines them through a "Generate-Evaluate-Evolve" loop. We introduce a Visual Arena to assess prompt fitness via reference-free pairwise tournaments, and a Semantic Mutation operator to inject diversity and correct semantic errors. Furthermore, a Heterogeneous Arena module integrates geometric priors with semantic reasoning to ensure robust final selection. Extensive experiments demonstrate that EVOL-SAM3 not only substantially outperforms static baselines but also significantly surpasses fully supervised state-of-the-art methods on the challenging ReasonSeg benchmark in a zero-shot setting. The code is available at https://github.com/AHideoKuzeA/Evol-SAM3.

Kai Ye, Xiaotong You, Jianghang Lin, Jiayi Ji, Pingyang Dai, Liujuan Cao• 2025

Related benchmarks

Task	Dataset	Result
Reasoning Segmentation	ReasonSeg (val)	gIoU70.7	327
Referring Expression Segmentation	RefCOCO (testA)	cIoU73.7	315
Referring Expression Segmentation	RefCOCO+ (testA)	--	288
Referring Expression Segmentation	RefCOCO+ (val)	--	272
Referring Expression Segmentation	RefCOCO (val)	--	261
Referring Expression Segmentation	RefCOCO (testB)	--	259
Referring Expression Segmentation	RefCOCO+ (testB)	--	256
Reasoning Segmentation	ReasonSeg (test)	gIoU72.5	236
Referring Expression Segmentation	RefCOCOg (val)	cIoU65.9	172
Referring Expression Segmentation	RefCOCO UMD (val)	cIoU68.7	50

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord