Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting

About

Reasoning Segmentation requires models to interpret complex, context-dependent linguistic queries to achieve pixel-level localization. Current dominant approaches rely heavily on Supervised Fine-Tuning (SFT) or Reinforcement Learning (RL). However, SFT suffers from catastrophic forgetting and domain dependency, while RL is often hindered by training instability and rigid reliance on predefined reward functions. Although recent training-free methods circumvent these training burdens, they are fundamentally limited by a static inference paradigm. These methods typically rely on a single-pass "generate-then-segment" chain, which suffers from insufficient reasoning depth and lacks the capability to self-correct linguistic hallucinations or spatial misinterpretations. In this paper, we challenge these limitations and propose EVOL-SAM3, a novel zero-shot framework that reformulates reasoning segmentation as an inference-time evolutionary search process. Instead of relying on a fixed prompt, EVOL-SAM3 maintains a population of prompt hypotheses and iteratively refines them through a "Generate-Evaluate-Evolve" loop. We introduce a Visual Arena to assess prompt fitness via reference-free pairwise tournaments, and a Semantic Mutation operator to inject diversity and correct semantic errors. Furthermore, a Heterogeneous Arena module integrates geometric priors with semantic reasoning to ensure robust final selection. Extensive experiments demonstrate that EVOL-SAM3 not only substantially outperforms static baselines but also significantly surpasses fully supervised state-of-the-art methods on the challenging ReasonSeg benchmark in a zero-shot setting. The code is available at https://github.com/AHideoKuzeA/Evol-SAM3.

Kai Ye, Xiaotong You, Jianghang Lin, Jiayi Ji, Pingyang Dai, Liujuan Cao• 2025

Related benchmarks

TaskDatasetResultRank
Referring Expression SegmentationRefCOCO (testA)
cIoU73.7
217
Reasoning SegmentationReasonSeg (val)
cIoU63.4
145
Referring Expression SegmentationRefCOCOg (val)
cIoU65.9
107
Reasoning SegmentationReasonSeg (test)
gIoU72.5
102
Referring Expression SegmentationRefCOCO UMD (val)
cIoU68.7
50
Reasoning SegmentationReasonSeg long (test)
cIoU0.733
40
Reasoning SegmentationReasonSeg short (test)
cIoU47.3
40
Showing 7 of 7 rows

Other info

Follow for update