R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

About

Multimodal Large Language Models (MLLMs) equipped with step-by-step thinking capabilities have demonstrated remarkable performance on complex reasoning problems. However, this thinking process is redundant for simple problems solvable without complex reasoning. To address this inefficiency, we propose R-4B, an auto-thinking MLLM, which can adaptively decide when to think based on problem complexity. The central idea of R-4B is to empower the model with both thinking and non-thinking capabilities using bi-mode annealing, and apply Bi-mode Policy Optimization (BPO) to improve the model's accuracy in determining whether to activate the thinking process. Specifically, we first train the model on a carefully curated dataset spanning various topics, which contains samples from both thinking and non-thinking modes. Then it undergoes a second phase of training under an improved GRPO framework, where the policy model is forced to generate responses from both modes for each input query. Experimental results show that R-4B achieves state-of-the-art performance across 25 challenging benchmarks. It outperforms Qwen2.5-VL-7B in most tasks and achieves performance comparable to larger models such as Kimi-VL-A3B-Thinking-2506 (16B) on reasoning-intensive benchmarks with lower computational cost.

Qi Yang, Bolin Ni, Shiming Xiang, Han Hu, Houwen Peng, Jie Jiang• 2025

Related benchmarks

Task	Dataset	Result
Reasoning	ARC	Accuracy84.7	269
Mathematical Reasoning	MATH500	Accuracy74.4	104
Mathematical Reasoning	Olympiad	Pass@1 Accuracy34.9	60
Multimodal Dental Question Answering	MMOral-Uni	II-Loc3.8	32
Multimodal Dental Image Analysis	MMOral-Uni 1.0 (test)	Loc Score3.8	28
Question Answering	CSQA	Accuracy (CSQA)85.7	24
General Reasoning	Aggregate Easy and Hard Benchmarks	Accuracy60.9	18
Massive Multitask Language Understanding	MMLU-Pro	Accuracy46.9	18
Math Word Problems	GSM8K	Accuracy88.6	18
Mathematical Reasoning	AIME 2024	Accuracy (ACC)11.7	18

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord