Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training

About

Existing reasoning data curation pipelines score whole samples, treating every intermediate step as equally valuable. In reality, steps within a trace contribute very unevenly, and selecting reasoning data well requires assessing them individually. We present GRACE, a gradient-aligned curation method that views each reasoning trace as a sequence of optimization events and scores every step by two complementary signals: its alignment with the answer-oriented gradient direction, and its consistency with the preceding reasoning trajectory. Step-level scores are aggregated into a sample-level value for subset selection, using only the model's internal optimization signals and no external reward models or step annotations. To make this scalable, GRACE introduces a representation-level gradient proxy that estimates step-level alignment from token-level upstream signals in a single forward pass. Post-training Qwen3-VL-2B-Instruct on MMathCoT-1M, GRACE reaches 108.8% of the full-data performance with 20% of the data and retains 100.2% with only 5%, with subsets that transfer effectively across model backbones.

Junjie Li, Ziao Wang, NingXuan Ma, Jianghong Ma, Xiaofeng Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Multimodal EvaluationMME
MME-P Score1.51e+3
114
Multimodal BenchmarkingMMBench
Score73.8
73
Mathematical ReasoningMathVista
MathVista54.3
55
Science Question AnsweringSQA
SQA Score85
26
Multimodal ReasoningMultiple Evaluation Benchmarks Aggregate (test)
Relative Average Performance105.2
24
Hallucination DetectionHallusionBench
Hallusion Score48.5
20
Mathematical Vision ReasoningMathVision
Score (MINI)21.7
11
Multi-task and Multi-image ReasoningMMT-Bench
SI Score58.5
11
Showing 8 of 8 rows

Other info

Follow for update