Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ViRC: Enhancing Visual Interleaved Mathematical CoT with Reason Chunking

About

CoT has significantly enhanced the reasoning ability of LLMs while it faces challenges when extended to multimodal domains, particularly in mathematical tasks. Existing MLLMs typically perform textual reasoning solely from a single static mathematical image, overlooking dynamic visual acquisition during reasoning. In contrast, humans repeatedly examine visual image and employ step-by-step reasoning to prove intermediate propositions. This strategy of decomposing the problem-solving process into key logical nodes adheres to Miller's Law in cognitive science. Inspired by this insight, we propose a ViRC framework for multimodal mathematical tasks, introducing a Reason Chunking mechanism that structures multimodal mathematical CoT into consecutive Critical Reasoning Units (CRUs) to simulate human expert problem-solving patterns. CRUs ensure intra-unit textual coherence for intermediate proposition verification while integrating visual information across units to generate subsequent propositions and support structured reasoning. To this end, we present CRUX dataset by using three visual tools and four reasoning patterns to provide explicitly annotated CRUs across multiple reasoning paths for each mathematical problem. Leveraging the CRUX dataset, we propose a progressive training strategy inspired by human cognitive learning, which includes Instructional SFT, Practice SFT, and Strategic RL, aimed at further strengthening the Reason Chunking ability of the model. The resulting ViRC-7B model achieves a 18.8% average improvement over baselines across multiple mathematical benchmarks. Code is available at https://github.com/Leon-LihongWang/ViRC.

Lihong Wang, Liangqi Li, Weiwei Feng, Jiamin Wu, Changtao Miao, Tieru Wu, Rui Ma, Bo Zhang, Zhe Li• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGeoQA (test)
Accuracy75.07
31
Mathematical ReasoningMMStar Math
Accuracy77.2
19
Mathematical ReasoningMathVista Math
ALL Accuracy81.11
19
Visual ReasoningHR-Bench (test)
Accuracy69.94
15
Visual ReasoningVisualProbe (VP) cross-domain (test)
Accuracy0.4357
15
Visual ReasoningV* cross-domain (test)
Accuracy79.06
15
Showing 6 of 6 rows

Other info

Follow for update