One RL to See Them All: Visual Triple Unified Reinforcement Learning

About

Reinforcement learning (RL) is becoming an important direction for post-training vision-language models (VLMs), but public training methodologies for unified multimodal RL remain much less mature, especially for heterogeneous reasoning and perception-heavy tasks. We propose V-Triune, a Visual Triple Unified Reinforcement Learning methodology for unified multimodal RL. It organizes training around three coordinated abstractions: Sample-Level Reward Routing, Verifier-Level Outcome Verification, and Source-Level Diagnostics. Within this methodology, Dynamic IoU provides localization-specific reward shaping that avoids reward ambiguity under loose thresholds and reward sparsity under strict ones. Built on V-Triune, we develop Orsta (7B, 32B), a family of models jointly trained on eight reasoning and perception tasks. Under matched budgets, unified training matches or outperforms specialist mixtures. The final Orsta models improve over their backbones on MEGA-Bench, compare favorably with strong multi-task RL-VLM baselines, and transfer these gains to a broad set of downstream benchmarks. These results show that unified RL can improve both reasoning and perception within a single VLM RL pipeline.The V-Triune system, along with the Orsta models, is publicly available at https://github.com/MiniMax-AI/One-RL-to-See-Them-All.

Yan Ma, Linge Du, Xuyang Shen, Shaoxiang Chen, Pengfei Li, Qibing Ren, Lizhuang Ma, Yuchao Dai, Pengfei Liu, Junjie Yan• 2025

Related benchmarks

Task	Dataset	Result
Visual Mathematical Reasoning	MathVision	Accuracy28.2	254
Visual Mathematical Reasoning	MathVista (testmini)	Accuracy72.5	88
General Visual Reasoning	MMStar	Accuracy59.6	46
Visual Search	V*	Accuracy78	28
Hallucination Evaluation	POPE Overall	Accuracy86.9	21
Visual Mathematical Reasoning	MathVerse Vision Only	Accuracy42.9	18
Visual Mathematical Reasoning	WeMath strict	Accuracy31.8	18
Spatial Reasoning	SpatialScore hard	Accuracy16.4	18
Medical Grounding	HAM10000	A@0.588.92	16
Medical Grounding	TN3K	A@0.543.5	16

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord