Faithful-MR1: Faithful Multimodal Reasoning via Anchoring and Reinforcing Visual Attention

About

Reinforcement learning with verifiable rewards (RLVR) has emerged as a promising paradigm for advancing complex reasoning in large language models, and recent work extends RLVR to multimodal large language models (MLLMs). This transfer, however, surfaces a faithfulness challenge: faithful perception of task-relevant visual evidence and faithful use of that evidence during reasoning, leading to unsatisfactory gains on multimodal benchmarks. Specifically, existing perception supervision often operates on textual descriptions rather than natively on image regions, and faithful use is largely overlooked, exposing the perception-reasoning disconnect where correctly perceived evidence is dropped or contradicted during reasoning. To close these gaps, we propose Faithful-MR1, a training framework that anchors and reinforces visual attention to address both halves of faithful multimodal reasoning. The Anchoring stage turns perception into an explicit pre-reasoning subtask, supervising a dedicated <Focus> token's attention directly against image regions rather than through textual descriptions. The Reinforcing stage exposes faithful use through counterfactual image intervention, rewarding answer-correct trajectories that concentrate visual attention where vision causally matters. Extensive experiments demonstrate that Faithful-MR1 outperforms recent multimodal reasoning baselines on both Qwen2.5-VL-Instruct 3B and 7B backbones while using substantially less training data.

Changyuan Tian, Zhicong Lu, Huaxing Liu, Xiang Wang, Shuai Li, Yu Chen, Wenqian Lv, Zichuan Lin, Juncheng Diao, Deheng Ye• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MathVista	Accuracy73.5	382
Mathematical Reasoning	WeMath	Accuracy68.9	317
Mathematical Reasoning	MathVerse	Accuracy51.9	266
Visual Question Answering	MMMU-Pro	Accuracy39.7	26
General VQA	HallusionBench	Accuracy69.8	20
Math Reasoning	MathVision	Accuracy28.3	20
Math Reasoning	DynaMath	Worst Case Accuracy (WCA)26.8	11

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord