Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning

About

Large vision-language models exhibit inherent capabilities to handle diverse visual perception tasks. In this paper, we introduce VisionReasoner, a unified framework capable of reasoning and solving multiple visual perception tasks within a shared model. Specifically, by designing a unified reward mechanism and multi-object cognitive learning strategies, VisionReasoner enhances its reasoning capabilities to analyze visual inputs, and addresses diverse perception tasks within a unified model. VisionReasoner generates a structured reasoning process before delivering the desired outputs responding to user queries. Human evaluation reveals the reasoning process of VisionReasoner is faithful and reliable even without annotated reasoning train data. To rigorously assess unified visual perception capabilities, we evaluate VisionReasoner on ten diverse tasks spanning three critical domains: detection, segmentation, and counting. Experimental results show that VisionReasoner achieves superior performance as a unified model, outperforming the baseline Qwen2.5VL by relative margins of 29.1\% on COCO (detection), 22.1\% on ReasonSeg (segmentation), and 13.2\% on CountBench (counting).

Yuqi Liu, Tianyuan Qu, Zhisheng Zhong, Bohao Peng, Shu Liu, Bei Yu, Jiaya Jia• 2025

Related benchmarks

TaskDatasetResultRank
Referring Expression ComprehensionRefCOCO+ (val)--
345
Referring Expression ComprehensionRefCOCO (val)
Accuracy88.7
335
Referring Expression ComprehensionRefCOCO (testA)--
333
Referring Expression ComprehensionRefCOCOg (test)
Accuracy88.7
291
Referring Expression ComprehensionRefCOCOg (val)
Accuracy88.7
291
Referring Expression SegmentationRefCOCO (testA)--
217
Referring Expression SegmentationRefCOCO+ (testA)--
190
Referring Expression ComprehensionRefCOCO+ (test-A)
Accuracy88.7
172
Reasoning SegmentationReasonSeg (val)
cIoU60.3
145
Reasoning SegmentationReasonSeg (test)
gIoU65.5
102
Showing 10 of 51 rows

Other info

Follow for update