MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning

About

DeepSeek R1, and o1 have demonstrated powerful reasoning capabilities in the text domain through stable large-scale reinforcement learning. To enable broader applications, some works have attempted to transfer these capabilities to multimodal reasoning. However, these efforts have been limited by the limited difficulty of selected tasks and relatively small training scales, making it challenging to demonstrate strong multimodal reasoning abilities. To address this gap, we introduce the MMK12 dataset and MM-EUREKA with 7B and 32B parameters. The former is a high-quality multimodal mathematics reasoning dataset featuring diverse knowledge domains with human-verified answers and solution processes. The latter is a multimodal model employing rule-based reinforcement learning on MMK12, utilizing online filtering and two-stage training strategy to enhance training stability. MM-EUREKA demonstrates remarkable performance gains in multimodal mathematical reasoning, outperforming previous powerful models like InternVL2.5-78B or InternVL2.5-38B-MPO. In particular, MM-EUREKA achieves competitive or superior performance compared to both open-source and closed-source models, and trails slightly behind o1 in multidisciplinary reasoning tasks. We open-source our complete pipeline to foster further research in this area. We release all our codes, models, data, etc. at https://github.com/ModalMinds/MM-EUREKA

Fanqing Meng, Lingxiao Du, Zongkai Liu, Zhixiang Zhou, Quanfeng Lu, Daocheng Fu, Tiancheng Han, Botian Shi, Wenhai Wang, Junjun He, Kaipeng Zhang, Ping Luo, Yu Qiao, Qiaosheng Zhang, Wenqi Shao• 2025

Related benchmarks

Task	Dataset	Result
Visual Question Answering	ChartQA	Accuracy77.3	620
Mathematical Reasoning	MathVista	Score71.9	566
Multimodal Understanding	MMStar	Accuracy60.4	511
Visual Mathematical Reasoning	MathVista	Accuracy70.6	448
Multi-discipline Multimodal Understanding	MMMU	--	422
Visual Question Answering	AI2D	Accuracy84.1	402
Mathematical Reasoning	MathVista	Accuracy73	382
Mathematical Reasoning	WeMath	Accuracy66.6	317
Visual Mathematical Reasoning	MathVision	Accuracy32.23	298
Mathematical Multimodal Reasoning	MathVista	Accuracy74.8	276

Showing 10 of 167 rows

...

Other info

Follow for update

@wizwand_team Discord