Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs

About

Reasoning is essential for large language models (LLMs), especially in complex tasks such as mathematical problem solving. However, multimodal reasoning still faces challenges in modality alignment and training scalability, as many existing methods rely on additional annotations or complex rule-based rewards. To address these issues, we propose the Deliberate-to-Intuitive reasoning framework (D2I), which improves the understanding and reasoning abilities of multimodal LLMs (MLLMs) without extra annotations or complex rewards. During training, D2I uses deliberate reasoning strategies supervised only by rule-based format rewards to enhance modality alignment. During inference, it shifts to intuitive reasoning by removing these explicit strategies, allowing the model to implicitly apply the acquired abilities in its responses. D2I outperforms baselines on both in-domain and out-of-domain benchmarks, highlighting the effectiveness of format rewards in fostering transferable multimodal reasoning skills and suggesting the benefit of decoupling training-time reasoning depth from test-time response flexibility.

Yahan Yu, Yuyang Dong, Masafumi Oyamada• 2025

Related benchmarks

Task	Dataset	Result
Object Hallucination Evaluation	POPE	Accuracy88.8	2056
Multimodal Evaluation	MME	--	902
Multimodal Mathematical Reasoning	MathVista mini	Accuracy0.722	124
Multimodal Evaluation	SEED-Bench	Accuracy77.3	117
Multimodal Mathematical Reasoning	MathVerse mini	Accuracy53.8	45
Multimodal Reasoning	MATH-Vision (full)	Accuracy27.2	38
Multimodal Mathematical Reasoning	GEOQA-8k (test)	Accuracy65	17
General Multimodal Evaluation	MMVet turbo	Overall Score69.7	16
General Multimodal Evaluation	MMMU (val)	Accuracy67.6	14

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord