Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning

About

DeepSeek R1, and o1 have demonstrated powerful reasoning capabilities in the text domain through stable large-scale reinforcement learning. To enable broader applications, some works have attempted to transfer these capabilities to multimodal reasoning. However, these efforts have been limited by the limited difficulty of selected tasks and relatively small training scales, making it challenging to demonstrate strong multimodal reasoning abilities. To address this gap, we introduce the MMK12 dataset and MM-EUREKA with 7B and 32B parameters. The former is a high-quality multimodal mathematics reasoning dataset featuring diverse knowledge domains with human-verified answers and solution processes. The latter is a multimodal model employing rule-based reinforcement learning on MMK12, utilizing online filtering and two-stage training strategy to enhance training stability. MM-EUREKA demonstrates remarkable performance gains in multimodal mathematical reasoning, outperforming previous powerful models like InternVL2.5-78B or InternVL2.5-38B-MPO. In particular, MM-EUREKA achieves competitive or superior performance compared to both open-source and closed-source models, and trails slightly behind o1 in multidisciplinary reasoning tasks. We open-source our complete pipeline to foster further research in this area. We release all our codes, models, data, etc. at https://github.com/ModalMinds/MM-EUREKA

Fanqing Meng, Lingxiao Du, Zongkai Liu, Zhixiang Zhou, Quanfeng Lu, Daocheng Fu, Tiancheng Han, Botian Shi, Wenhai Wang, Junjun He, Kaipeng Zhang, Ping Luo, Yu Qiao, Qiaosheng Zhang, Wenqi Shao• 2025

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringChartQA
Accuracy77.3
371
Multimodal UnderstandingMMStar
Accuracy60.4
324
Visual Mathematical ReasoningMathVista
Accuracy70.6
278
Mathematical ReasoningMathVista
Accuracy73
257
Visual Question AnsweringAI2D
Accuracy84.1
249
Mathematical Multimodal ReasoningMathVerse
Accuracy67.15
221
Mathematical Multimodal ReasoningMathVista
Accuracy74.8
218
Multi-discipline Multimodal UnderstandingMMMU (val)--
204
Visual Mathematical ReasoningMathVision
Accuracy27.4
186
Multimodal Math ReasoningMathVision
Accuracy34.4
183
Showing 10 of 126 rows
...

Other info

Follow for update