Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs

About

Reasoning is essential for large language models (LLMs), especially in complex tasks such as mathematical problem solving. However, multimodal reasoning still faces challenges in modality alignment and training scalability, as many existing methods rely on additional annotations or complex rule-based rewards. To address these issues, we propose the Deliberate-to-Intuitive reasoning framework (D2I), which improves the understanding and reasoning abilities of multimodal LLMs (MLLMs) without extra annotations or complex rewards. During training, D2I uses deliberate reasoning strategies supervised only by rule-based format rewards to enhance modality alignment. During inference, it shifts to intuitive reasoning by removing these explicit strategies, allowing the model to implicitly apply the acquired abilities in its responses. D2I outperforms baselines on both in-domain and out-of-domain benchmarks, highlighting the effectiveness of format rewards in fostering transferable multimodal reasoning skills and suggesting the benefit of decoupling training-time reasoning depth from test-time response flexibility.

Yahan Yu, Yuyang Dong, Masafumi Oyamada• 2025

Related benchmarks

TaskDatasetResultRank
Object Hallucination EvaluationPOPE
Accuracy88.8
2019
Multimodal EvaluationMME--
727
Multimodal EvaluationSEED-Bench
Accuracy77.3
112
Multimodal Mathematical ReasoningMathVista mini
Accuracy0.722
111
Multimodal Mathematical ReasoningMathVerse mini
Accuracy53.8
39
Multimodal ReasoningMATH-Vision (full)
Accuracy27.2
38
Multimodal Mathematical ReasoningGEOQA-8k (test)
Accuracy65
17
General Multimodal EvaluationMMVet turbo
Overall Score69.7
16
General Multimodal EvaluationMMMU (val)
Accuracy67.6
14
Showing 9 of 9 rows

Other info

Follow for update