GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training

About

Existing reasoning data curation pipelines score whole samples, treating every intermediate step as equally valuable. In reality, steps within a trace contribute very unevenly, and selecting reasoning data well requires assessing them individually. We present GRACE, a gradient-aligned curation method that views each reasoning trace as a sequence of optimization events and scores every step by two complementary signals: its alignment with the answer-oriented gradient direction, and its consistency with the preceding reasoning trajectory. Step-level scores are aggregated into a sample-level value for subset selection, using only the model's internal optimization signals and no external reward models or step annotations. To make this scalable, GRACE introduces a representation-level gradient proxy that estimates step-level alignment from token-level upstream signals in a single forward pass. Post-training Qwen3-VL-2B-Instruct on MMathCoT-1M, GRACE reaches 108.8% of the full-data performance with 20% of the data and retains 100.2% with only 5%, with subsets that transfer effectively across model backbones.

Junjie Li, Ziao Wang, NingXuan Ma, Jianghong Ma, Xiaofeng Zhang• 2026

Related benchmarks

Task	Dataset	Result
Multimodal Evaluation	MME	MME-P Score1.51e+3	139
Multimodal Benchmarking	MMBench	Score73.8	73
Mathematical Reasoning	MathVista	MathVista54.3	55
Science Question Answering	SQA	SQA Score85	36
Multimodal Reasoning	Multiple Evaluation Benchmarks Aggregate (test)	Relative Average Performance105.2	24
Hallucination Detection	HallusionBench	Hallusion Score48.5	20
Mathematical Vision Reasoning	MathVision	Score (MINI)21.7	11
Multi-task and Multi-image Reasoning	MMT-Bench	SI Score58.5	11

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord