MIND: Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large Models

About

Recently, multimodal large language models (MLLMs) have been widely applied to reasoning tasks. However, they suffer from limited multi-rationale semantic modeling, insufficient logical robustness, and susceptibility to misleading cues. Therefore, we propose a Multi-rationale INtegrated Discriminative (MIND) reasoning framework, which is designed to endow MLLMs with human-like cognitive abilities of "Understand -> Rethink -> Correct", and achieves a paradigm evolution from passive imitation-based reasoning to active discriminative reasoning. Specifically, we introduce a Rationale Augmentation and Discrimination (RAD) paradigm, which provides a unified and extensible data foundation. Meanwhile, we design a Progressive Two-stage Correction Learning (P2CL) strategy. The first phase enhances multi-rationale positive learning, while the second phase enables active logic discrimination and correction. In addition, to mitigate representation entanglement in the multi-rationale semantic space, we propose a Multi-rationale Contrastive Alignment (MCA) optimization strategy. Extensive experiments show that our MIND achieves SOTA performance on multiple public datasets. Our data and code are available at https://github.com/YuChuang1205/MIND

Chuang Yu, Jinmiao Zhao, Mingxuan Zhao, Yunpeng Liu, Xiujun Shu, Yuanhao Feng, Bo Wang, Xiangyu Yue• 2025

Related benchmarks

Task	Dataset	Result
Science Question Answering	ScienceQA	IMG Score0.9276	64
Multimodal Reasoning	M3CoT (test)	Total Acc61.56	55
Multi-choice Visual Question Answering	A-OKVQA	Accuracy70.6	49

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord