LENS: Learning to Segment Anything with Unified Reinforced Reasoning
About
Text-prompted image segmentation enables fine-grained visual understanding and is critical for applications such as human-computer interaction and robotics. However, existing supervised fine-tuning methods typically ignore explicit chain-of-thought (CoT) reasoning at test time, which limits their ability to generalize to unseen prompts and domains. To address this issue, we introduce LENS, a scalable reinforcement-learning framework that jointly optimizes the reasoning process and segmentation in an end-to-end manner. We propose unified reinforcement-learning rewards that span sentence-, box-, and segment-level cues, encouraging the model to generate informative CoT rationales while refining mask quality. Using a publicly available 3-billion-parameter vision-language model, i.e., Qwen2.5-VL-3B-Instruct, LENS achieves an average cIoU of 81.2% on the RefCOCO, RefCOCO+, and RefCOCOg benchmarks, outperforming the strong fine-tuned method, i.e., GLaMM, by up to 5.6%. These results demonstrate that RL-driven CoT reasoning significantly enhances text-prompted segmentation and offers a practical path toward more generalizable Segment Anything models (SAM). Code is available at https://github.com/hustvl/LENS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reasoning Segmentation | ReasonSeg (val) | gIoU62.1 | 327 | |
| Referring Expression Segmentation | RefCOCO (testA) | cIoU85.3 | 315 | |
| Referring Expression Segmentation | RefCOCO+ (testA) | -- | 288 | |
| Referring Expression Segmentation | RefCOCO+ (val) | -- | 272 | |
| Referring Expression Segmentation | RefCOCO (val) | -- | 261 | |
| Referring Expression Segmentation | RefCOCO (testB) | -- | 259 | |
| Referring Expression Segmentation | RefCOCO+ (testB) | -- | 256 | |
| Reasoning Segmentation | ReasonSeg (test) | gIoU57.2 | 236 | |
| Referring Expression Segmentation | RefCOCO UMD (val) | cIoU84.2 | 50 | |
| Concept Segmentation | CD Concepts Saliency | Weighted F-measure (Fw_beta)76.9 | 7 |