LENS: Learning to Segment Anything with Unified Reinforced Reasoning

About

Text-prompted image segmentation enables fine-grained visual understanding and is critical for applications such as human-computer interaction and robotics. However, existing supervised fine-tuning methods typically ignore explicit chain-of-thought (CoT) reasoning at test time, which limits their ability to generalize to unseen prompts and domains. To address this issue, we introduce LENS, a scalable reinforcement-learning framework that jointly optimizes the reasoning process and segmentation in an end-to-end manner. We propose unified reinforcement-learning rewards that span sentence-, box-, and segment-level cues, encouraging the model to generate informative CoT rationales while refining mask quality. Using a publicly available 3-billion-parameter vision-language model, i.e., Qwen2.5-VL-3B-Instruct, LENS achieves an average cIoU of 81.2% on the RefCOCO, RefCOCO+, and RefCOCOg benchmarks, outperforming the strong fine-tuned method, i.e., GLaMM, by up to 5.6%. These results demonstrate that RL-driven CoT reasoning significantly enhances text-prompted segmentation and offers a practical path toward more generalizable Segment Anything models (SAM). Code is available at https://github.com/hustvl/LENS.

Lianghui Zhu, Bin Ouyang, Yuxuan Zhang, Tianheng Cheng, Rui Hu, Haocheng Shen, Longjin Ran, Xiaoxin Chen, Li Yu, Wenyu Liu, Xinggang Wang• 2025

Related benchmarks

Task	Dataset	Result
Reasoning Segmentation	ReasonSeg (val)	gIoU62.1	327
Referring Expression Segmentation	RefCOCO (testA)	cIoU85.3	315
Referring Expression Segmentation	RefCOCO+ (testA)	--	288
Referring Expression Segmentation	RefCOCO+ (val)	--	272
Referring Expression Segmentation	RefCOCO (val)	--	261
Referring Expression Segmentation	RefCOCO (testB)	--	259
Referring Expression Segmentation	RefCOCO+ (testB)	--	256
Reasoning Segmentation	ReasonSeg (test)	gIoU57.2	236
Referring Expression Segmentation	RefCOCO UMD (val)	cIoU84.2	50
Concept Segmentation	CD Concepts Saliency	Weighted F-measure (Fw_beta)76.9	7

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord